Subtle Concept of pointers and array - c

I have a very strange example of pointers, which requires your kind help. In general, pointers are used to point to a variable (see first example below), but when it points to an array. I don't understand why it no longer requires deferencing to obtain the array (see second example below)
printf("TEST: %i\n", x[i]);// I expect this should be *x[i]
This is indeed very strange. Is it just a C convention or how do you explain this?
EDIT::Since many have provided a clear answer, I want to include another small follow up question, as all of you mentioned x[i] = *(x+i), what about x[i][j] for a 2 dimensional array? what is it equivalent to ? Does it require dereferencing?
With a normal variable
int j = 4;
int* pointerj=&j;//pointerj holds address of j or points to j
printf("%d",*pointerj); returns the value that pointer j points to
With an array :
#include <stdio.h>
#include <stdlib.h>
int *function(unsigned int tags) {
int i;
int *var;
var = (int*)malloc(sizeof(int)*tags); // malloc is casted to int* type , allocated dynamical memory, it returns a pointer!
//so now var holds the address to a dynamical array.
for (i = 0; i < tags; i++) {
var[i] = i;
}
return var;
}
int main() {
int *x;
int i;
x = function(10);
for (i = 0; i < 10; i++) {
printf("TEST: %i\n", x[i]);// I expect this should be *x[i]
}
free(x); x = NULL;
return 0;
}

don't understand why it no longer requires dereferencing to obtain the array (element)
Well, you are dereferencing, it's just not using the dereference operator *.
The array subscripting operator [] is serving as job of dereferencing here. quoting C11, chapter §6.5.2.1
A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the
initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th
element of E1 (counting from zero).
It's a syntactic sugar. The expressions x[i] and *(x+i) are equivalent. The later satisfies your expectation, whereas, the first one, disguises the dereference operator, but does the exact same job you expected.
That said, also follow the data type closely. What you were expecting, something along the line of *x[i] would be plain invalid, as it boils down to something like `((x+i) ). Now,
x is of type int [] (integer array, which decays to a pointer to integer)
x+i gives you a pointer to int type result.
the inner dereference operator is applied to it, resulting an int.
The outer *, now will try to operate on an operand of type int (not a pointer), and, this operation will be a syntactical error, as, the constraint for dereference operator says,
The operand of the unary * operator shall have pointer type.
in C11, chapter §6.5.3.2.
Answering the additional question:
Let me use the quote, once again, this one's from paragraph 2, chapter §6.5.2.1
Successive subscript operators designate an element of a multidimensional array object.
If E is an n-dimensional array (n ≥ 2) with dimensions i × j × . . . × k, then E (used as
other than an lvalue) is converted to a pointer to an (n − 1)-dimensional array with
dimensions j × . . . × k. If the unary * operator is applied to this pointer explicitly, or
implicitly as a result of subscripting, the result is the referenced (n − 1)-dimensional
array, which itself is converted into a pointer if used as other than an lvalue. It follows
from this that arrays are stored in row-major order (last subscript varies fastest).
Consider the array object defined by the declaration
int x[3][5];
Here x is a 3 × 5 array of ints; more precisely, x is an array of three element objects, each of which is an
array of five ints. In the expression x[i], which is equivalent to (*((x)+(i))), x is first converted to
a pointer to the initial array of five ints. Then i is adjusted according to the type of x, which conceptually
entails multiplying i by the size of the object to which the pointer points, namely an array of five int
objects. The results are added and indirection is applied to yield an array of five ints. When used in the
expression x[i][j], that array is in turn converted to a pointer to the first of the ints, so x[i][j]
yields an int.

This is ok. No * there.
If you have pointer int *a which points to array of valid data, for example:
int *a;
int arr[5] = {1, 2, 3, 4, 5];
a = arr;
By using a[0] you are already dereferencing pointer and * is not required as a[0] is the same as *(a + 0).
You can go further and even more complicate code for readers.
You may use i[x] in your example instead of x[i]. Why?
x[i] is equal to *(x + i) but it is also equivalent to *(i + x) which is what arrays are in C.
It works even with numbers, such as x[3] or 3[x] will give you the same result.

In C, doing x[i] is equivalent (yeah I know) at i[x]. Your compiler compiles this by doing :
*(x+i), where you retrieve your favorite symbole.
x+i is a pointer where you start from x and advance by i. The result takes account of the type of the pointer (0xff is not the same address for a char* than for a int*).

You misunderstood how arrays work. Indeed, when using a variable such as int you should access it's address by
int x;
int *p2x = &x
but when using array, it's a bit different. int y[SOME_SIZE]; is a chunk of bytes in the memory, and *y will take you to the first element in that memory location, and dereference it. y[0] will do the same. *y[0] will first take the value in y[0] and then will try to dereference it... not what you usually want :(

the simple answer to your question is that :
The name of array is pointer to first element of array.

Related

Pointer operation yields unexpected result

I was expecting the code below to print 4 (since a float is 4 bytes), but it prints 1. Would someone explain why this happens?
#include <stdio.h>
int main()
{
float a[4]={0.0,0.1,0.2,0.3};
printf("%d", &a[1]-&a[0]);
return 0;
}
First of all, change
printf("%d", &a[1]-&a[0]);
to
printf("%td", &a[1]-&a[0]);
as the result type of two subtraction yields a type ptrdiff_t and %td is the conversion specifier for that type.
That said, quoting C11, chapter §6.5.6, subtraction operator (emphasis mine)
When two pointers are subtracted, both shall point to elements of the same array object,
or one past the last element of the array object; the result is the difference of the
subscripts of the two array elements. [....] In
other words, if the expressions P and Q point to, respectively, the i-th and j-th elements of
an array object, the expression (P)-(Q) has the value i−j provided the value fits in an object of type ptrdiff_t. [....]
In your case, P is &a[1] and Q is &a[0], so i is 1 and j is 0. Hence the result of the subtraction operation is i-j, i.e., 1-0, 1.
You are correct that the two pointers are 4 bytes apart. And if you were subtracting two integers you'd get 4. But &a[1] and &a[0] are of type float *. Pointer arithmetic in C takes into account the size of the thing being pointed to, so &a[1]-&a[0] is 1.
This is the basic means by which array indexing works. You can take advantage of this to iterate through an array without needing a separate index and instead terminating on a boundary such as NaN.
#include <stdio.h>
#include <math.h>
int main()
{
float a[] = { 0.0,0.1,0.2,0.3,NAN };
float *iter = a;
while(!isnan(*iter)) {
printf("%f\n", *iter);
iter++;
}
}
If you instead cast the values to unsigned int you will indeed get 4.
printf("%u\n", (unsigned int)&a[1]-(unsigned int)&a[0]);

Why don't we have to pass the row size of a multidimensional array to a function?

What is the reason, that the first index in the function void transposeMatrix(int a[][arraySize]) is empty?
Because what transposeMatrix receives really isn't a 2D array, but rather a pointer to a 1D array.
Except when it is the operand of the sizeof or unary & operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element of the array.
If you declare an array as
int arr[N][M];
and pass it to a function
void foo( arr );
then the expression arr is converted from type "N-element array of M-element array of int" to "pointer to M-element array of int" (int (*)[M]).
In a function parameter declaration, T a[N] and T a[] are "adjusted" to T *a - IOW, all three declare a as a pointer to T (this is only true for function parameter declarations).
Thus,
void transposeMatrix(int a[][arraySize])
is equivalent to
void transposeMatrix(int (*a)[arraySize])
Using a[][M] rather than (*a)[M] is a notational convenience (similar to using p->m instead of (*p).m for accessing struct and union members through pointers).
Because of how array indexing works, you can index into a like any other 2D array. Remember that the subscript operation a[i] is defined as *(a + i) - given a starting address a, find the address of the i'th element (not byte) following a and dereference the result. So:
(*a)[i] == (*(a + 0))[i] == (a[0])[i] == a[0][i]
meaning
(*(a + j))[i] == (a[j])[i] == a[j][i]
When you have an array a of n objects of type foo, the compiler can calculate the location of element a[i] by adding i times the size of a foo object to the starting address of a.
Note that this calculation involves only the size of foo and the subscript i. It does not involve the number of elements n. The compiler does not need to know the number of elements in order to calculate where an element is.
You would need to know the number of elements in an array to avoid going beyond the end of an array. However, in C, it is not the compiler’s job to guard against going beyond the end of the array. The author of the function is responsible for doing that. So, for this purpose, the compiler does not need to know how many elements are in an array.
Now suppose each foo object is itself an array of m objects of type bar. The compiler does need to know the size of a foo object. How big is a foo? Since a foo is an array of m objects of type bar, its size is m times the size of bar. So, to know how big a foo is, the compiler needs to know how many elements in in the array.
Thus, when passing an int a[FirstDimension][SecondDimension], the compiler does not need to know FirstDimension but does need to know SecondDimension.

Is a[n] really interchangeable with *(a+n) - why does sizeof return two different answers?

I'm having a problem understanding one thing in C. I've read in "ANSI C" that statements like a[n] where a is array are really equivalent to *(a+n). So here's a small code snippet I've written to check that:
#include <stdio.h>
int main(void)
{
int a[10] = {0,1,2,3,4,5,6,7,8,9};
int *p = a;
printf("sizeof a: %d\n", sizeof(a));
printf("sizeof p: %d\n", sizeof(p));
return 0;
}
After executing the code the program outputs:
sizeof a: 40
sizeof p: 8
I don't understand - what did I just do? How are a and p different objects? (Judging by the output of sizeof function)
According to the C Standard (6.5.3.4 The sizeof and alignof operators)
2 The sizeof operator yields the size (in bytes) of its operand,
which may be an expression or the parenthesized name of a type. The
size is determined from the type of the operand. The result is an
integer. If the type of the operand is a variable length array type,
the operand is evaluated; otherwise, the operand is not evaluated and
the result is an integer constant.
So object a defined as an array of 10 integers occupies a memory extent that has size of 40 bytes because each element of it in turn has size of 4 bytes (that is in your environment sizeof( int ) is equal to 4).
int a[10] = {0,1,2,3,4,5,6,7,8,9};
Object p is defined as a pointer and initialized by the address of the first element of the array a
int *p = a;
In the environment where you compiled the program size of pointer is equal to 8 bytes.
You can consider declaration
int *p = a;
also like
int *p = &a[0];
Yes, the expression a[n] is equivalent to *(a+n). But that's not relevant to your code sample.
You've defined a as an array object consisting of 10 int objects, and p as a pointer object of type int*. The equivalence you mention means, for example, that the expressions a[5] is equivalent to *(a+5; both are int expressions with the value 5. But neither expression appears in your code.
You've defined a and p as objects of different types, and there is no rule in C that says these objects are interchangable. sizeof a is equivalent to 10 * sizeof (int), and sizeof p is equivalent to sizeof (int*) (the size of a pointer to int).
One special rule that your program does depend on is that an expression of array type is, in most contexts, implicitly converted to a pointer to the array's initial element. In:
int *p = a;
a is an expression of array type. It's converted to the equivalent of &a[0], and that pointer value is used to initialize p. Thereafter, a and p are equivalent in most contexts; they both refer (one indirectly, one directly) to the address of the initial element of the array object a. But the operand of the sizeof operator is one of the contexts in which this conversions does not take place; sizeof a yields the size of the array object, not the size of a pointer.
All this is explained very well in section 6 of the comp.lang.c FAQ.

C pointer to pointer

Does
int **p
and
int *p[1]
mean the same thing? as both can be passed to functions allowing the change the pointer object, also both can be accessed via p[0], *p ?
Update, thanks for your help, tough Memory management seems different. does the access mechanism remain the same
*eg: p[0] becomes *(p+0) & *p (both pointing to something)
Thanks
Not quite.
int **p;
declares a pointer p, which will be used to point at objects of type int *, ie, pointers to int. It doesn't allocate any storage, or point p at anything in particular yet.
int *p[1];
declares an array p of one pointer to int: p's type can decay to int ** when it's passed around, but unlike the first statement, p here has an initial value and some storage is set aside.
Re. the edited question on access syntax: yes, *p == p[0] == *(p+0) for all pointers and arrays.
Re. the comment asking about sizeof: it deals properly with arrays where it can see the declaration, so it gives the total storage size.
void foo()
{
int **ptr;
int *array[10];
sizeof(ptr); // just the size of the pointer
sizeof(array); // 10 * sizeof(int *)
// popular idiom for getting count of elements in array:
sizeof(array)/sizeof(array[0]);
}
// this would always discard the array size,
// because the argument always decays to a pointer
size_t my_sizeof(int *p) { return sizeof(p); }
To simplify things, you could factor out one level of pointers since it's not relevant to the question.
The question then becomes: what's the difference between T* t and T t[1], where T is some type.
There are several differences, but the most obvious one has to do with memory management: the latter allocates memory for a single value of type T, whereas the the former does not (but it does allocate memory for the pointer).
They are not the same thing, although in many cases they can appear to behave the same way.
To make the discussion below flow better, I'm going to take the liberty of renaming your variables:
int **pp; // pointer to pointer
int *ap[1]; // array of pointer
If an expression of type "N-element array of T" appears in most contexts, it will be converted to an expression of type "pointer to T" whose value is the address of the first element in the array (the exceptions to this rule are when the array expression is an operand of either the sizeof or unary & operators, or is a string literal being used to initialize another array in a declaration).
So, suppose you write something like
foo(ap);
The expression ap has type "1-element array of pointer to int", but by the rule above it will be converted to an expression of type "pointer to pointer to int"; thus, the function foo will receive an argument of type int **, not int *[1].
On the other side of the equation, subscripting is defined in terms of pointer arithmetic: E1[E2] is defined as *(E1 + E2) where one of the expressions is a pointer value and the other is an integral value. Thus you can use a subscript operator on pp as though it were an array. This is why we can treat dynamically-allocated buffers as though they were regular arrays:
pp = malloc(sizeof *pp * N); // allocate N pointers to int (type of *pp == int *)
if (pp)
{
size_t i;
for (i = 0; i < N; i++)
pp[i] = ...; // set pp[i] to point to some int value
}
Now for some major differences. First of all, array expressions may not be the target of an assignment; for example, you can't write something like
ap = some_new_pointer_value();
As mentioned above, array expressions will not be converted to pointer types if they are the operands of either the sizeof or unary & operators. Thus, sizeof ap tells you the number of bytes required to store a 1-element array of type int *, not a pointer to a pointer to int. Similarly, the expression &ap has type int *(*)[1] (pointer to 1-element array of pointer to int), rather than int *** (which would be the case for &pp).
No, they are not the same.
int **p is a pointer to a pointer to int.
int *p[1] is an array (of length 1) of pointers to int.
They are not same:
int **p
Is a pointer which points to another pointer whose type is int *
while,
int *p[1];
Is an array of size 1 to the type int *
They are different.
int **p
means a pointer to a pointer to an int.
int *p[1]
means an array containing one element, with that element being a pointer to an int.
The second form can be treated the same as the first in some situations, e.g. by passing it to a function.

In C, are arrays pointers or used as pointers?

My understanding was that arrays were simply constant pointers to a sequence of values, and when you declared an array in C, you were declaring a pointer and allocating space for the sequence it points to.
But this confuses me: the following code:
char y[20];
char *z = y;
printf("y size is %lu\n", sizeof(y));
printf("y is %p\n", y);
printf("z size is %lu\n", sizeof(z));
printf("z is %p\n", z);
when compiled with Apple GCC gives the following result:
y size is 20
y is 0x7fff5fbff930
z size is 8
z is 0x7fff5fbff930
(my machine is 64 bit, pointers are 8 bytes long).
If 'y' is a constant pointer, why does it have a size of 20, like the sequence of values it points to? Is the variable name 'y' replaced by a memory address during compilation time whenever it is appropiate? Are arrays, then, some sort of syntactic sugar in C that is just translated to pointer stuff when compiled?
Here's the exact language from the C standard (n1256):
6.3.2.1 Lvalues, arrays, and function designators
...
3 Except when it is the operand of the sizeof operator or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.
The important thing to remember here is that there is a difference between an object (in C terms, meaning something that takes up memory) and the expression used to refer to that object.
When you declare an array such as
int a[10];
the object designated by the expression a is an array (i.e., a contiguous block of memory large enough to hold 10 int values), and the type of the expression a is "10-element array of int", or int [10]. If the expression a appears in a context other than as the operand of the sizeof or & operators, then its type is implicitly converted to int *, and its value is the address of the first element.
In the case of the sizeof operator, if the operand is an expression of type T [N], then the result is the number of bytes in the array object, not in a pointer to that object: N * sizeof T.
In the case of the & operator, the value is the address of the array, which is the same as the address of the first element of the array, but the type of the expression is different: given the declaration T a[N];, the type of the expression &a is T (*)[N], or pointer to N-element array of T. The value is the same as a or &a[0] (the address of the array is the same as the address of the first element in the array), but the difference in types matters. For example, given the code
int a[10];
int *p = a;
int (*ap)[10] = &a;
printf("p = %p, ap = %p\n", (void *) p, (void *) ap);
p++;
ap++;
printf("p = %p, ap = %p\n", (void *) p, (void *) ap);
you'll see output on the order of
p = 0xbff11e58, ap = 0xbff11e58
p = 0xbff11e5c, ap = 0xbff11e80
IOW, advancing p adds sizeof int (4) to the original value, whereas advancing ap adds 10 * sizeof int (40).
More standard language:
6.5.2.1 Array subscripting
Constraints
1 One of the expressions shall have type ‘‘pointer to object type’’, the other expression shall have integer type, and the result has type ‘‘type’’.
Semantics
2 A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero).
Thus, when you subscript an array expression, what happens under the hood is that the offset from the address of the first element in the array is computed and the result is dereferenced. The expression
a[i] = 10;
is equivalent to
*((a)+(i)) = 10;
which is equivalent to
*((i)+(a)) = 10;
which is equivalent to
i[a] = 10;
Yes, array subscripting in C is commutative; for the love of God, never do this in production code.
Since array subscripting is defined in terms of pointer operations, you can apply the subscript operator to expressions of pointer type as well as array type:
int *p = malloc(sizeof *p * 10);
int i;
for (i = 0; i < 10; i++)
p[i] = some_initial_value();
Here's a handy table to remember some of these concepts:
Declaration: T a[N];
Expression Type Converts to Value
---------- ---- ------------ -----
a T [N] T * Address of the first element in a;
identical to writing &a[0]
&a T (*)[N] Address of the array; value is the same
as above, but the type is different
sizeof a size_t Number of bytes contained in the array
object (N * sizeof T)
*a T Value at a[0]
a[i] T Value at a[i]
&a[i] T * Address of a[i]
Declaration: T a[N][M];
Expression Type Converts to Value
---------- ---- ------------ -----
a T [N][M] T (*)[M] Address of the first subarray (&a[0])
&a T (*)[N][M] Address of the array (same value as
above, but different type)
sizeof a size_t Number of bytes contained in the
array object (N * M * sizeof T)
*a T [M] T * Value of a[0], which is the address
of the first element of the first subarray
(same as &a[0][0])
a[i] T [M] T * Value of a[i], which is the address
of the first element of the i'th subarray
&a[i] T (*)[M] Address of the i-th subarray; same value as
above, but different type
sizeof a[i] size_t Number of bytes contained in the i'th subarray
object (M * sizeof T)
*a[i] T Value of the first element of the i'th
subarray (a[i][0])
a[i][j] T Value at a[i][j]
&a[i][j] T * Address of a[i][j]
Declaration: T a[N][M][O];
Expression Type Converts to
---------- ---- -----------
a T [N][M][O] T (*)[M][O]
&a T (*)[N][M][O]
*a T [M][O] T (*)[O]
a[i] T [M][O] T (*)[O]
&a[i] T (*)[M][O]
*a[i] T [O] T *
a[i][j] T [O] T *
&a[i][j] T (*)[O]
*a[i][j] T
a[i][j][k] T
From here, the pattern for higher-dimensional arrays should be clear.
So, in summary: arrays are not pointers. In most contexts, array expressions are converted to pointer types.
Arrays are not pointers, though in most expressions an array name evaluates to a pointer to the first element of the array. So it is very, very easy to use an array name as a pointer. You will often see the term 'decay' used to describe this, as in "the array decayed to a pointer".
One exception is as the operand to the sizeof operator, where the result is the size of the array (in bytes, not elements).
A couple additional of issues related to this:
An array parameter to a function is a fiction - the compiler really passes a plain pointer (this doesn't apply to reference-to-array parameters in C++), so you cannot determine the actual size of an array passed to a function - you must pass that information some other way (maybe using an explicit additional parameter, or using a sentinel element - like C strings do)
Also, a common idiom to get the number of elements in an array is to use a macro like:
#define ARRAY_SIZE(arr) ((sizeof(arr))/sizeof(arr[0]))
This has the problem of accepting either an array name, where it will work, or a pointer, where it will give a nonsense result without warning from the compiler. There exist safer versions of the macro (particularly for C++) that will generate a warning or error when it's used with a pointer instead of an array. See the following SO items:
C++ version
a better (though still not perfectly safe) C version
Note: C99 VLAs (variable length arrays) might not follow all of these rules (in particular, they can be passed as parameters with the array size known by the called function). I have little experience with VLAs, and as far as I know they're not widely used. However, I do want to point out that the above discussion might apply differently to VLAs.
sizeof is evaluated at compile-time, and the compiler knows whether the operand is an array or a pointer. For arrays it gives the number of bytes occupied by the array. Your array is a char[] (and sizeof(char) is 1), thus sizeof happens to give you the number of elements. To get the number of elements in the general case, a common idiom is (here for int):
int y[20];
printf("number of elements in y is %lu\n", sizeof(y) / sizeof(int));
For pointers sizeof gives the number of bytes occupied by the raw pointer type.
In
char hello[] = "hello there"
int i;
and
char* hello = "hello there";
int i;
In the first instance (discounting alignment) 12 bytes will be stored for hello with the allocated space initialised to hello there while in the second hello there is stored elsewhere (possibly static space) and hello is initialised to point to the given string.
hello[2] as well as *(hello + 2) will return 'e' in both instances however.
In addition to what the others said, perhaps this article helps: http://en.wikipedia.org/wiki/C_%28programming_language%29#Array-pointer_interchangeability
If 'y' is a constant pointer, why does it have a size of 20, like the sequence of values it points to?
Because z is the address of the variable, and will always return 8 for your machine. You need to use the dereference pointer (&) in order to get the contents of a variable.
EDIT: A good distinction between the two: http://www.cs.cf.ac.uk/Dave/C/node10.html

Resources