Why pointers and arrays are not same : Need justification on quote - c

I need the following quote especially the bold text , to be justified.
... This would tend to imply that
somehow source[i] is the same as *(p+i).
In fact, this is true, i.e wherever one writes a[i] it can be replaced with *(a + i) without any problems.
In fact, the compiler will create the same code in either case. Thus we see
that pointer arithmetic is the same thing as array indexing. Either syntax produces the
same result.
This is NOT saying that pointers and arrays are the same thing, they are not. We are only
saying that to identify a given element of an array we have the choice of two syntaxes,
one using array indexing and the other using pointer arithmetic, which yield identical
results.
This is quoted from the pdf
A TUTORIAL ON POINTERS AND ARRAYS IN C
by Ted Jensen
Version 1.2 (PDF Version)
Sept. 2003
P.No : 19

The expressions a[i] and *(a + i) are the same by definition. The C and C++ standards define the expression syntax a[i] to be always equivalent to *(a + i), to the point where a[i] and i[a] are interchangeable.
That does not tell you anything about the relationship between pointers and arrays, because, technically, in both a[i] and *(a + i), a is an expression of pointer type. The [] operator takes a pointer expression on the left, not an array expression; and the + operator only operates on a pointer and an integer -- not an array and an integer.
That you can use an array in place of a is simply a result of the fact that an expression of array type may be implicitly converted to an rvalue expression of pointer type.

Array of type T may behave similar to pointer of type T, but their types are different and this compiler treat these as different types.
Let's understand with some pragmatic examples.
Example 1 where you can see the difference.
int a[1200] = {0};
int *p = a;
assert(sizeof a != sizeof p);
Example 2 where you can see the difference.
/* test.c */
#include<stdio.h>
#include"head.h"
struct token id_tokens[10];
int main()
{
printf("In original file: %p",id_tokens);
testing();
}
/* head.h */
struct token {
int temp;
};
/* test1.c with v1 */
#include<stdio.h>
#include"head.h"
extern struct token* id_tokens;
void testing () {
printf("In other file %p",id_tokens);
}
/* test1.c with v2 */
#include<stdio.h>
#include"head.h"
extern struct token id_tokens[];
void testing () {
printf("In other file %p",id_tokens);
}
Output with v1: Output : In original file: 0x601040In other file
(nil) Output with v2: Output : In original file: 0x601040In other
file 0x601040
Example 2 picked from extern declaration, T* v/s T[]
Further readings on array and pointer: C-faq

An array is a block of consecutive elements, each of the same type. For example, a block of 200 integers.
A pointer is a small object that can identify a memory location.
These are nothing like each other. The array could be gigabytes in size but the pointer is 8 bytes at most (on popular systems).
Saying "arrays and pointers are the same" or "an array is a constant pointer" or other such garbage makes as much sense as saying that a house is an envelope.
All of that has nothing to do with talking about a[n] versus *(a + n). That is a feature of the language syntax, which is a different topic to the layout of variables in memory.
We could ban the use of [] outside of declarators and C would still have the same functionality, but the code would be harder to read. The rules of the language say that you can write a[n] as a more readable version of *(a + n); and they have the same effect.

Pointers and array names are different
Yes.
Consider the below code snippet to get the difference
int a[3];
int *p = a;
a++; /* This is a error */
p++;
printf("%d\n",*p);
Array name can't be a modifiable lvalue.(According to the standard)
Where as a pointer can be.
Adding another example where you can see a difference
printf("%d\n",sizeof(a)/sizeof(*a));
printf("%d\n",sizeof(p)/sizeof(*p));
Number of elements in a array can be given by the first line but not by the second one. Why? Array name and Pointers are different

Related

Access an array from the end in C?

I recently noticed that in C, there is an important difference between array and &array for the following declaration:
char array[] = {4, 8, 15, 16, 23, 42};
The former is a pointer to a char while the latter is a pointer to an array of 6 chars. Also it is notable that the writing a[b] is a syntactic sugar for *(a + b). Indeed, you could write 2[array] and it works perfectly according to the standard.
So we could take advantage of this information to write this:
char last_element = (&array)[1][-1];
&array has a size of 6 chars so (&array)[1]) is a pointer to chars located right after the array. By looking at [-1] I am therefore accessing the last element.
With this I could for example swap the entire array :
void swap(char *a, char *b) { *a ^= *b; *b ^= *a; *a ^= *b; }
int main() {
char u[] = {1,2,3,4,5,6,7,8,9,10};
for (int i = 0; i < sizeof(u) / 2; i++)
swap(&u[i], &(&u)[1][-i - 1]);
}
Does this method for accessing an array by the end have flaws?
The C standard does not define the behavior of (&array)[1].
Consider &array + 1. This is defined by the C standard, for two reasons:
When doing pointer arithmetic, the result is defined for results from the first element (with index 0) of an array to one beyond the last element.
When doing pointer arithmetic, a pointer to a single object behaves like a pointer to an array with one element. In this case, &array is a pointer to a single object (that is itself an array, but the pointer arithmetic is for the pointer-to-the-array, not a pointer-to-an-element).
So &array + 1 is defined pointer arithmetic that points just beyond the end of array.
However, by definition of the subscript operator, (&array)[1] is *(&array + 1). While the &array + 1 is defined, applying * to it is not. C 2018 6.5.6 8 explicitly tells us, about result of pointer arithmetic, “If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.”
Because of the way most compilers are designed, the code in the question may move data around as you desire. However, this is not a behavior you should rely on. You can obtain a good pointer to just beyond the last element of the array with char *End = array + sizeof array / sizeof *array;. Then you can use End[-1] to refer to the last element, End[-2] to refer to the penultimate element, and so on.
Although the Standard specifies that arrayLvalue[i] means (*((arrayLvalue)+(i))), which would be processed by taking the address of the first element of arrayLvalue, gcc sometimes treats [], when applied to an array-type value or lvalue, as an operator which behaves line an indexed version of .member syntax, yielding a value or lvalue which the compiler will treat as being part of the array type. I don't know if this is ever observable when the array-type operand isn't a member of a struct or union, but the effects are clearly demonstrable in cases where it is, and I know of nothing that would guarantee that similar logic wouldn't be applied to nested arrays.
struct foo {unsigned char x[12]};
int test1(struct foo *p1, struct foo *p2)
{
p1->x[0] = 1;
p2->x[1] = 2;
return p1->x[0];
}
int test2(struct foo *p1, struct foo *p2)
{
char *p;
p1->x[0] = 1;
(&p2->x[0])[1] = 2;
return p1->x[0];
}
The code gcc generates for test1 will always return 1, while the generated code for test2 will return whatever is in p1->x[0]. I am unaware of anything in the Standard or the documentation for gcc that would suggest the two functions should behave differently, nor how one should force a compiler to generate code that would accommodate the case where p1 and p2 happen to identify overlapping parts of an allocated block in the event that should be necessary. Although the optimization used in test1() would be reasonable for the function as written, I know of no documented interpretation of the Standard that would treat that case as UB but define the behavior of the code if it wrote to p2->x[0] instead of p2->x[1].
I would do a for loop where I set i = length of the vector - 1 and each time instead of increasing it, I decrease it until it is greater than 0.
for(int i = vet.length;i>0;i--)

Explanation how pointers and multidimensional arrays work in C

I am trying to understand the following code.
#include <stdio.h>
#include <stdlib.h>
void print2(int (* a)[2]) {
int i, j;
for (i = 0; i < 3; i++ ) {
for (j = 0; j < 2; j++ ) {
printf("%d", a[i][j]);
}
printf("\n");
}
}
void print3(int (* a)[3]) {
int i, j;
for (i = 0; i < 2; i++ ) {
for (j = 0; j < 3; j++ ) {
printf("%d", a[i][j]);
}
printf("\n");
}
}
int main() {
int a[] = { 1, 2, 3, 4, 5, 6 };
print2((int (*)[2]) a);
print3((int (*)[3]) a);
return 0;
}
Running the code returns following output in console:
12
34
56
123
456
My problem is I don't understand where these numbers come from. I have trouble understanding what is actually going on in this code. More specifically, I'm uncertain of what this means:
int( (* a)[2])
I hope someone can explain this code to me, because I really want to understand how pointers and multidimensional arrays work in C.
TL;DR
This code contains incorrect and meaningless hacks. There is not much of value to learn from this code.
Detailed explanation follows.
First of all, this is a plain 1D array that gets printed in different ways.
These lines are strictly speaking bugs:
print2((int (*)[2]) a);
print3((int (*)[3]) a);
In both cases there is an invalid pointer conversion, because a is of type int[6] and a pointer to the array a would have to be int (*)[6]. But the print statements are wrong in another way too, a when used in an expression like this "decays" into a pointer to the first element. So the code is casting from int* to int(*)[2] etc, which is invalid.
These bugs can in theory cause things like misaligned access, trap representations or code getting optimized away. In practice it will very likely "work" on all mainstream computers, even though the code is relying on undefined behavior.
If we ignore that part and assume void print2(int (*a)[2]) gets a valid parameter, then a is a pointer to an array of type int[2].
a[i] is pointer arithmetic on such a type, meaning that each i would correspond to an int[2] and if we had written a++, the pointer would jump forward sizeof(int[2]) in memory (likely 8 bytes).
Therefore the function abuses this pointer arithmetic on a[i] to get array number i, then do [j] on that array to get the item in that array.
If you actually had a 2D array to begin with, then it could make sense to declare the functions as:
void print (size_t x, size_t y, int (*a)[x][y])
Though this would be annoying since we would have to access the array as (*a)[i][j]. Instead we can use a similar trick as in your code:
void print (size_t x, size_t y, int (*a)[x][y])
{
int(*arr)[y] = a[0];
...
arr[i][j] = whatever; // now this syntax is possible
This trick too uses pointer arithmetic on the array pointer arr, then de-references the array pointed at.
Related reading that explains these concepts with examples: Correctly allocating multi-dimensional arrays
void print2(int (*a)[2]) { /*...*/ }
inside the function print2 a is a pointer to arrays of 2 ints
void print3(int (*a)[3]) { /*...*/ }
inside the function print3 a is a pointer to arrays of 3 ints
int a[] = {1, 2, 3, 4, 5, 6};
inside the function main a is an array of 6 ints.
In most contexts (including function call context) a is converted to a pointer to the first element: a value of type "pointer to int".
The types "pointer to int", "pointer to array of 2/3 ints" are not compatible, so calling any of the functions with print2(a) (or print3(a)) forces a diagnostic from the compiler.
But you use a cast to tell the compiler: "do not issue any diagnostic. I know what I'm doing"
print3(a); // type of a (after conversion) and type of argument of print3 are not compatible
// print3((cast)a); // I know what I'm doing
print3((int (*)[3])a); // change type of a to match argument even if it does not make sense
It would be much easier to understand if you break it down and understand things. What if you were to pass the whole array to say a function print4, which iterates over the array and prints the elements? How would you pass the array to such a function.
You can write it something like
print4( (int *) a);
which can be simplified and just written as print4(a);
Now in your case by doing print2((int (*)[2]) a);, you are actually designing a pointer to an array of 2 int elements. So now the a is pointer in array of two elements i.e. every increment to the pointer will increase the offset by 2 ints in the array a
Imagine with the above modeling done, your original array becomes a two dimensional array of 3 rows with 2 elements each. That's how your print2() element iterates over the array a and prints the ints. Imagine a function print2a that works by taking a local pointer to a and increments at each iteration to the point to the next two elements
void print2a(int (* a)[2]) {
int (* tmp)[2] = a;
for( int i = 0; i < 3; i++ ) {
printf("%d%d\n", tmp[0][0], tmp[0][1] );
tmp++;
}
}
The same applies to print3() in which you pass an pointer to array of 3 ints, which is now modeled as a 2-D array of 2 rows with 3 elements in it.
The code seeks to reinterpret the array int a[6] as if it were int a[3][2] or int a[2][3], that is, as if the array of six int in memory were three arrays of two int (in print2) or two arrays of three int (in print3).
While the C standard does not fully define the pointer conversions, this can be expected to work in common C implementations (largely because this sort of pointer conversion is used in existing software, which provides motivation for compilers to support it).
In (int (*)[2]) a, a serves as a pointer to its first element.1 The cast converts this pointer to int to a pointer to an array of two int. This conversion is partially defined C 2018 6.3.2.3 7:
The behavior is undefined if the alignment of a is not suitable for the type int (*)[2]. However, compilers that have stricter alignment for arrays than for their element types are rare, and no practical compiler has a stricter alignment for an array of six int than it does for an array of two or three int, so this will not occur in practice.
When the resulting pointer is converted back to int *, it will compare equal to the original pointer.
The latter property tells us that the resulting pointer contains all the information of the original pointer, since it must contain the information needed to reconstruct the original pointer. It does not tell us that the resulting pointer is actually pointing to the memory where a is.
As noted above, common C implementations allow this. I know that Apple’s versions of GCC and Clang support this reshaping of arrays, although I do not know whether this guarantee was added by Apple or is in the upstream versions.
Given that (int (*)[2]) is passed to print2 as its a, then a[i][j] refers to element j of array i. That is, a points to an array of two int, so a[0] is that array, a[1] is the array of two int that follows it in memory, and a[2] is the array of two int after that. Then a[i][j] is element j of the selected array. In effect, a[i][j] in print2 is a[i*2+j] in main.
Note that no aliasing rules are violated as no arrays are accessed by a[i][j]: a is a pointer, a[i] is an array but is not accessed (it is automatically converted to a pointer, per footnote 1 below), and a[i][j] has type int and accesses an object with effective type int, so C’s aliasing rules in C 2018 6.5 7 are satisfied.
Footnotes
1 This is because when an array is used in an expression, it is automatically converted to a pointer to its first element, except when it is the operand of sizeof, is the operand of unary &, or is a string literal used to initialize an array.

multiArray and multiArray[0] and &multiArray[0] same?

On 6th line instead of multiArray[0], when I write multiArray, program still works. Don't understand why. I was thinking before that multiArray is a pointer to multiArray[0] which is a pointer to multiArray[0][0]. So multiArray alone is a pointer to a pointer. multiArray[0] is a pointer to a 4 element int array. So it seems that multiArray and multiArray[0] must be different. But in below code, both work. Print function I wrote expects a pointer to a 4 element int array. So only multiArray[0] must work and multiArray must not work. But both works. Didn't understand that.
#include <stdio.h>
void printArr(int(*ptr)[4]);
int i, k;
int main(void){
int multiArray[3][4] = { { 1, 5, 2, 4 }, { 0, 6, 3, 14 }, { 132, 4, 22, 5 } };
int(*point)[4] = multiArray[0];
for (k = 0; k < 3; k++)
{
printArr(point++);
}
getchar();
}
void printArr(int(*ptr)[4]){
int *temp = (int *)ptr;
for (i = 0; i < 4; i++)
{
printf("%d ", *temp);
temp++;
}
puts("\n");
}
Someone else wrote "Multi-dimensional arrays are syntactic sugar for 1-D arrays".
This is sort of like saying that int is just syntactic sugar for a unsigned char[4] . You could do away with expressions like 4 + 5 and get the same result by manipulating arrays of 4 bytes.
You could even say that C is just syntactic sugar for a Universal Turing Machine script, if you want to take this concept a bit further.
The reality is that multi-dimensional arrays are a part of the type system in C, and they have syntax associated with them. There's more than one way to skin a cat.
Moving on, the way C arranges what we are calling a multi-dimension array is to say: "Arrays can only have one dimension, but the element type may itself be another array". We say "multi-dimension array" as a matter of convenience, but the syntax and the type system actually reflect the one-dimensional nature of the array.
So, int multiArray[3][4] is an array of 3 elements. Each of those elements is an array of 4 ints.
In memory, an array's elements are stored contiguously -- regardless of what the element type is. So, the memory layout is an array of 4 int, immediately followed by another array of 4 int, and finally another array of 4 int.
There are 12 contiguous int in memory, and in the C type system they are grouped up into 3 groups of 4.
You will note that the first int of the 12 is also the first int of the first group of 4. This is why we find that if we ask "What is the memory location of the first int?", "What is the memory location of the first group of 4 ints?", and "What is the memory location of the entire bloc of 12 ints?", we get the same answer every time. (In C, the memory location of a multi-byte object is considered to start at the location of its first byte).
Now, to talk about the pointer syntax and representation. In C, a pointer tells you where in memory an object can be found. There are two aspects to this: the memory location of the object, and what type of object it is. (The size of the object is a corollary of the type).
Some presentations only focus on the first of those, they will say things like "A pointer is just a number". But that is forgetting about the type information, which is a crucial part of a pointer.
When you print the pointer with %p, you lose the type information. You're just putting out the location in memory of the first byte. So they all look the same, despite the fact that the three pointers are pointing at differently-sized objects (which overlap each other like matruskha dolls).
In most implementations of C, the type information is all computed at compile-time, so if you try to understand C by comparing source code with assembly code (some people do this), you only see the memory-location part of the pointer. This can lead to misunderstanding if you forget that the type information is also crucial.
Footnote: All of this is independent of a couple of syntax quirks that C has; which have caused a lot of confusion over the years (but are also useful sometimes). The expression x is a shortcut for &x[0] if x is an array, except when used as the operand of & or sizeof. (Otherwise this would be a recursive definition!). The second quirk is that if you write what looks like an array declarator in a function formal parameter list, it is actually as if you wrote a pointer declarator. I stress again that these are just syntax oddities, they are not saying anything fundamental about the nature of arrays and pointers, which is actually not that complicated. The language would work just as well without both of these quirks.
Multidiemensional arrays var_t arr[size_y][size_x] provide means of declaring and accessing array elements (memory) in a conveniant manner. But all multidiemensional arrays are internally continuous memory blocks.
You may say that arr[y][x] = arr[y*cols+x].
In terms of pointer-level, the pointers multiArray and multiArray[0] are the same, they're int* - though the formal type for arr will be int (*)[2]. Using that type will allow one to take advantage of all pointer mechanics (++ on such pointer will move the address by 8 bytes, not 4).
Try this:
void t1(int* param)
{
printf("t1: %d\n", *param);
}
void t2(int** param)
{
printf("t2: %d\n", **param);
}
int main(void) {
int arr[2][2] = { { 1, 2 } , { 3, 4 } };
t1(arr); // works ok
t1(arr[0]); // works ok
t2(arr); // seg fault
t2(arr[0]);
}
int(*point)[4] = multiArray[0];
This works because both multiArray[0] and multiArray point to same address, the address of first element of array: multiArray[0][0].
However in this case, you may get a warning from compiler because type of multiArray[0] is int* while of point is int [4]*(pointer to array of 4 integers).

C / C++ MultiDimensional Array Internals

I have a question about how C / C++ internally stores multidimensional arrays declared using the notation foo[m][n]. I am not questioning pure pointers to pointers etc... I am asking because of speed reasons...
Correct me if I am wrong, but syntactically foo is an array of pointers, which themselves point to an array
int foo[5][4]
*(foo + i) // returns a memory address
*( *(foo + i) + j) // returns an int
I have heard from many places that the C/C++ compiler converts foo[m][n] to a one dimensional array behind the scenes (calculating the required one dimension index with i * width + j). However if this was true then the following would hold
*(foo + 1) // should return element foo[0][1]
Thus my question:
Is it true that foo[m][n] is (always?) stored in memory as a flat one dimensional array?? If so, why does the above code work as shown.
A two-dimensional array:
int foo[5][4];
is nothing more or less than an array of arrays:
typedef int row[4]; /* type "row" is an array of 4 ints */
row foo[5]; /* the object "foo" is an array of 5 rows */
There are no pointer objects here, either explicit or implicit.
Arrays are not pointers. Pointers are not arrays.
What often causes confusion is that an array expression is, in most contexts, implicitly converted to a pointer to its first element. (And a separate rule says that what looks like an array parameter declaration is really a pointer declaration, but that doesn't apply in this example.) An array object is an array object; declaring such an object does not create any pointer objects. Referring to an array object can create a pointer value (the address of the array's first element), but there is no pointer object stored in memory.
The array object foo is stored in memory as 5 contiguous elements, where each element is itself an array of 4 contiguous int elements; the whole thing is therefore stored as 20 contiguous int objects.
The indexing operator is defined in terms of pointer arithmetic; x[y] is equivalent to *(x + y). Typically the left operand is going to be either a pointer expression or an array expression; if it's an array expression, the array is implicitly converted to a pointer.
So foo[x][y] is equivalent to *(foo[x] + y), which in turn is equivalent to *(*(foo + x) + y). (Note that no casts are necessary.) Fortunately, you don't have to write it that way, and foo[x][y] is a lot easier to understand.
Note that you can create a data structure that can be accessed with the same foo[x][y] syntax, but where foo really is a pointer to pointer to int. (In that case, the prefix of each [] operator is already a pointer expression, and doesn't need to be converted.) But to do that, you'd have to declare foo as a pointer-to-pointer-to-int:
int **foo;
and then allocate and initialize all the necessary memory. This is more flexible than int foo[5][4], since you can determine the number of rows and the size (or even existence) of each row dynamically.
Section 6 of the comp.lang.c FAQ explains this very well.
EDIT:
In response to Arrakis's comment, it's important to keep in mind the distinction between type and representation.
For example, these two types:
struct pair { int x; int y;};
typedef int arr2[2];
very likely have the same representation in memory (two consecutive int objects), but the syntax to access the elements is quite different.
Similarly, the types int[5][4] and int[20] have the same memory layout (20 consecutive int objects), but the syntax to access the elements is different.
You can access foo[2][2] as ((int*)foo)[10] (treating the 2-dimensional array as if it were a 1-dimensional array). And sometimes it's useful to do so, but strictly speaking the behavior is undefined. You can likely get away with it because most C implementations don't do array bounds-checking. On the other hand, optimizing compilers can assume that your code's behavior is defined, and generate arbitrary code if it isn't.
Yes, C/C++ stores a multi-dimensional (rectangular) array as a contiguous memory area. But, your syntax is incorrect. To modify element foo[0][1], the following code will work:
*((int *)foo+1)=5;
The explicit cast is necessary, because foo+1, is the same as &foo[1] which is not at all the same thing as foo[0][1]. *(foo+1) is a pointer to the fifth element in the flat memory area. In other words, *(foo+1) is basically foo[1] and **(foo+1) is foo[1][0]. Here is how the memory is laid out for some of your two dimensional array:
C arrays - even multi-dimensional ones - are contiguous, ie an array of type int [4][5] is structurally equivalent to an array of type int [20].
However, these types are still incompatible according to C language semantics. In particular, the following code is in violation of the C standard:
int foo[4][5] = { { 0 } };
int *p = &foo[0][0];
int x = p[12]; // undefined behaviour - can't treat foo as int [20]
The reason for this is that the C standard is (probably intentionally) worded in a way which makes bounds-checking implementations possible: As p is derived from foo[0], which has type int [5], valid indices must be in range 0..5 (resp. 0..4 if you actually access the element).
Many other programming languages (Java, Perl, Python, JavaScript, ...) use jagged arrays to implement multi-dimensional arrays. This is also possible in C by using an array of pointers:
int *bar[4] = { NULL };
bar[0] = (int [3]){ 0 };
bar[1] = (int [5]){ 1, 2, 3, 4 };
int y = bar[1][2]; // y == 3
However, jagged arrays are not contiguous, and the pointed-to arrays need not be of uniform size.
Because of implicit conversion of array expressions into pointer expressions, indexing jagged and non-jagged arrays looks identical, but the actual address calculations will be quite different:
&foo[1] == (int (*)[5])((char *)&foo + 1 * sizeof (int [5]))
&bar[1] == (int **)((char *)&bar + 1 * sizeof (int *))
&foo[1][2] == (int *)((char *)&foo[1] + 2 * sizeof (int))
== (int *)((char *)&foo + 1 * sizeof (int [5]) + 2 * sizeof (int))
&bar[1][2] == (int *)((char *)bar[1] + 2 * sizeof (int)) // no & before bar!
== (int *)((char *)*(int **)((char *)&bar + 1 * sizeof (int *))
+ 2 * sizeof (int))
int foo[5][4];
foo is not an array of pointers; it's an array of arrays. Below image will help.

Does C99 guarantee that arrays are contiguous?

Following an hot comment thread in another question, I came to debate of what is and what is not defined in C99 standard about C arrays.
Basically when I define a 2D array like int a[5][5], does the standard C99 garantee or not that it will be a contiguous block of ints, can I cast it to (int *)a and be sure I will have a valid 1D array of 25 ints.
As I understand the standard the above property is implicit in the sizeof definition and in pointer arithmetic, but others seems to disagree and says casting to (int*) the above structure give an undefined behavior (even if they agree that all existing implementations actually allocate contiguous values).
More specifically, if we think an implementation that would instrument arrays to check array boundaries for all dimensions and return some kind of error when accessing 1D array, or does not give correct access to elements above 1st row. Could such implementation be standard compilant ? And in this case what parts of the C99 standard are relevant.
We should begin with inspecting what int a[5][5] really is. The types involved are:
int
array[5] of ints
array[5] of arrays
There is no array[25] of ints involved.
It is correct that the sizeof semantics imply that the array as a whole is contiguous. The array[5] of ints must have 5*sizeof(int), and recursively applied, a[5][5] must have 5*5*sizeof(int). There is no room for additional padding.
Additionally, the array as a whole must be working when given to memset, memmove or memcpy with the sizeof. It must also be possible to iterate over the whole array with a (char *). So a valid iteration is:
int a[5][5], i, *pi;
char *pc;
pc = (char *)(&a[0][0]);
for (i = 0; i < 25; i++)
{
pi = (int *)pc;
DoSomething(pi);
pc += sizeof(int);
}
Doing the same with an (int *) would be undefined behaviour, because, as said, there is no array[25] of int involved. Using a union as in Christoph's answer should be valid, too. But there is another point complicating this further, the equality operator:
6.5.9.6
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space. 91)
91) Two objects may be adjacent in memory because they are adjacent elements of a larger array or adjacent members of a structure with no padding between them, or because the implementation chose to place them so, even though they are unrelated. If prior invalid pointer operations (such as accesses outside array bounds) produced undefined behavior, subsequent comparisons also produce undefined behavior.
This means for this:
int a[5][5], *i1, *i2;
i1 = &a[0][0] + 5;
i2 = &a[1][0];
i1 compares as equal to i2. But when iterating over the array with an (int *), it is still undefined behaviour, because it is originally derived from the first subarray. It doesn't magically convert to a pointer into the second subarray.
Even when doing this
char *c = (char *)(&a[0][0]) + 5*sizeof(int);
int *i3 = (int *)c;
won't help. It compares equal to i1 and i2, but it isn't derived from any of the subarrays; it is a pointer to a single int or an array[1] of int at best.
I don't consider this a bug in the standard. It is the other way around: Allowing this would introduce a special case that violates either the type system for arrays or the rules for pointer arithmetic or both. It may be considered a missing definition, but not a bug.
So even if the memory layout for a[5][5] is identical to the layout of a[25], and the very same loop using a (char *) can be used to iterate over both, an implementation is allowed to blow up if one is used as the other. I don't know why it should or know any implementation that would, and maybe there is a single fact in the Standard not mentioned till now that makes it well defined behaviour. Until then, I would consider it to be undefined and stay on the safe side.
I've added some more comments to our original discussion.
sizeof semantics imply that int a[5][5] is contiguous, but visiting all 25 integers via incrementing a pointer like int *p = *a is undefined behaviour: pointer arithmetics is only defined as long as all pointers invoved lie within (or one element past the last element of) the same array, as eg &a[2][1] and &a[3][1] do not (see C99 section 6.5.6).
In principle, you can work around this by casting &a - which has type int (*)[5][5] - to int (*)[25]. This is legal according to 6.3.2.3 §7, as it doesn't violate any alignment requirements. The problem is that accessing the integers through this new pointer is illegal as it violates the aliasing rules in 6.5 §7. You can work around this by using a union for type punning (see footnote 82 in TC3):
int *p = ((union { int multi[5][5]; int flat[25]; } *)&a)->flat;
This is, as far as I can tell, standards compliant C99.
If the array is static, like your int a[5][5] array, it's guaranteed to be contiguous.

Resources