Just was looking something up in the ISO/IEC9899 When I stumbled on this:
6.7.6 Type names
[...]
Semantics
2
In several contexts, it is necessary to specify a type. This is accomplished using a type
name, which is syntactically a declaration for a function or an object of that type that
omits the identifier.128)
3 EXAMPLE The constructions
(a) int
(b) int *
(c) int *[3]
(d) int (*)[3]
(e) int (*)[*]
(f) int *()
(g) int (*)(void)
(h) int (*const [])(unsigned int, ...)
name respectively the types (a) int, (b) pointer to int, (c) array of three pointers to int, (d) pointer to an
array of three ints, (e) pointer to a variable length array of an unspecified number of ints, (f) function
with no parameter specification returning a pointer to int, (g) pointer to function with no parameters
returning an int, and (h) array of an unspecified number of constant pointers to functions, each with one
parameter that has type unsigned int and an unspecified number of other parameters, returning an
int.
What most confused me was:
(e) pointer to a variable length array of an unspecified number of ints
The others I can understand more or less. But what is the use of a pointer to a VLA of unspecified number of 'ints'?
And is there even a need for compiler's to support the syntax of
int foo[*];
?
EDIT for clarification
This Question primaly aims on "Is it even neccessary to support this for a compiler?".
Whilest this post ANSI-C grammar - array declarations like [*] et alii clearly improved my knowledge. There is still no answer for: Why does the compiler need to know if the parameter of the prototype just is a address containing unknown size. as with simply doing int foo[] or it will be unspecified size?
So is this realy neccessary to be supported?
And if not so, why the standard even is implementing this semantic?
Why does the compiler need to know if the parameter of the prototype just is a address containing unknown size. as with simply doing int foo[] or it will be unspecified size?
The compiler doesn't need to "know" anything, it's a tool.
The difference between int (*)[*] and int[] is about the same as between int (*)[5] and int[]. If you agree that the latter pair is not interchangeable, then the former isn't either.
In pre-C99, the way to specify an array of unknown number of T elements is T[]. This is an incomplete type, which means you cannot have an array of T[]. There is no T[][]. Inside a function declarator, T[] means the same as T*. OTOH T[*] is a variable-length array, which is different from an array of unknown number of elements. You can have an array of variable-size arrays, i.e. there is T[*][*]. The syntax you are asking about is necessary to support this variable-size-array type. Luckily you are not asking why we need different types, because the answer would be really long-winded, but here's my stab at it.
The purpose of types is two-fold. First, types are needed for object code generation (things like a++ typically generate different object code, depending on the type of a). Second, types are needed for type-checking (things like a++ may be allowed or not depending on the type of a).
The [*] types are only allowed in function declarators that are not parts of function definitions. So code generation and is not relevant here. This leaves us with type checking. Indeed,
int foo(int, int (*)[*]);
int bar(int, int (*)[5]);
int main ()
{
int a;
int aa[5];
int aaa[5][5];
foo(1, &a); // incorrect, `&a` is `int*`, `int*` and `int (*)[*]` are different
bar(1, &a); // incorrect, `&a` is `int*`, `int*` and `int (*)[5]` are different
foo(5, aa); // incorrect, `aa` is `int*` (!), `int*` and `int (*)[*]` are different
bar(5, aa); // incorrect, `aa` is `int*` (!), `int*` and `int (*)[5]` are different
foo(5, &aa); // correct
bar(5, &aa); // correct
foo(5, aaa); // correct
bar(5, aaa); // correct
}
If we are agree on which calls to bar are correct and which are not, we must agree also on calls to foo.
The only remaining question is, why int foo(int m, int (*)[m]); is not enough for this purpose? It probably is, but the C language does not force the programmer to name formal parameters in function declarators where parameter names are not needed. [*] allows this small freedom in case of VLAs.
I am going to answer your question strictly as asked:
This Question primaly aims on "Is it even neccessary to support this for a compiler?"
For a C99 compiler, yes: it is part of the standard so a C99-conforming compiler must support it. The question of what int foo[*]; is useful for is quite orthogonal to the question of whether it must be supported. All compilers claiming to conform to C99 that I tested supported it (but I am not sure what it is useful for, either).
For a C11 compiler, good news! Variable-Length Arrays have been made a “conditional feature”. You can implement a C11-compliant without Variable-Length Arrays as long as it defines __STDC_NO_VLA__:
6.10.8.3 Conditional feature macros
…
__STDC_NO_VLA__ The integer constant 1, intended to indicate that the implementation does not support variable length arrays or variably modified types.
If I pass an array with more than one dimension to a function, and if the function parameters used to express the number of elements in a given dimension of the array come after the array parameter itself, the [*] syntax may be used. In the case of an array with more than two dimensions, and if the array parameter, again, precedes the element count parameters, this syntax must be used, as array decay only ever occurs once. After all, you can't very well use int (*)[][] or int [][][] because the standard requires that in int [A][B] and int [A][B][C][D], only A may be omitted due to the array decaying to a pointer. If you use pointer notation in the function parameter, you're allowed to use int (*)[], but this makes very little sense to me, especially since:
sizeof ptr[0] and sizeof *ptr are both illegal -- how should the compiler determine the size of an array that has an indeterminate element count? Instead you must find it at runtime using N * sizeof **ptr or sizeof(int (*)[N]). This also means that any arithmetic operations on ptr, such as the usage of ++ptr, are illegal since they rely upon the size information, which cannot be calculated. Type casts may be used to get around this, but it is easier just to use a local variable with the proper type information. Then again, why not just use the [*] syntax and include the proper type information from the start?
sizeof ptr[0][0] is illegal, but sizeof (*ptr)[0] is not -- array indexing is still performed even when simply getting info like size, so it is like writing sizeof (*(ptr + 0))[0], which is illegal because you cannot apply arithmetic operations to an incomplete type as previously mentioned.
Someone who has never encountered this issue before might think [] can be replaced by *, yielding int ** instead of int (*)[], which is incorrect because that sub-array has not decayed. Array decay only occurs once.
I noted that the [*] syntax is unnecessary if the parameters used as element counts came first, which is true, but when is the last time anybody saw any of the following?
void foo (int a, int b, int c, int arr[a][b][c]);
void bar (int a, int b, int c, int arr[][b][c]);
void baz (int a, int b, int c, int (*arr)[b][c]);
So to answer your question:
if a function is able to operate upon multidimensional arrays of various lengths (or a pointer to a 1-D array),
and
the parameters denoting element count are listed after the array parameter itself,
the [*] syntax may be required. I actually encourage usage of [*] since [] comes with problems when size information is required.
In C99 it is possible to declare arrays using variable dimensions, providing the variable has a ( positive integer ) value at the time the declaration is made. It turns out that this carries over to the declaration of arrays in function parameters as well, which can be particularly useful for multi-dimensional arrays.
For example, in the prototype:
int arrayFunction( int nRows, int nCols, double x[ nRows ],
double y[ nRows ][ nCols ] );
the variable dimension on x is informative to the human but not necessary for the computer, since we could have declared it as x[ ]. However the nCols dimension on y is very useful, because otherwise the function would have to be written for arrays with pre-determined row sizes, and now we can write a function that will work for arrays with any row length.
For two array types to be compatible, both must have compatible element types, and if both size specifiers are present and are integer constant expressions, then both sizes must have the same value. A VLA is always compatible with another array type if they both have the same element type. If the two array types are used in a context that requires them to be compatible, it is undefined behavior if the dimension sizes are unequal at run time
It might be useful if you want to work with "jagged" arrays, when the size of the rows of that matrix, while unknown at compile-time, will be initialized in run-time and the remain the same during the whole execution time. But to make sure you will stay in bounds for each row, you will have to store actual sizes of that array somehow, separately for each row if want it "jagged", because sizeof operator will not work properly for run-time initialized arrays (it will return the size of the pointer at best, since it's a compile-time operator).
Related
I imagine we all agree that it is considered idiomatic C to access a true multidimensional array by dereferencing a (possibly offset) pointer to its first element in a one-dimensional fashion, e.g.:
void clearBottomRightElement(int *array, int M, int N)
{
array[M*N-1] = 0; // Pretend the array is one-dimensional
}
int mtx[5][3];
...
clearBottomRightElement(&mtx[0][0], 5, 3);
However, the language-lawyer in me needs convincing that this is actually well-defined C! In particular:
Does the standard guarantee that the compiler won't put padding in-between e.g. mtx[0][2] and mtx[1][0]?
Normally, indexing off the end of an array (other than one-past the end) is undefined (C99, 6.5.6/8). So the following is clearly undefined:
struct {
int row[3]; // The object in question is an int[3]
int other[10];
} foo;
int *p = &foo.row[7]; // ERROR: A crude attempt to get &foo.other[4];
So by the same rule, one would expect the following to be undefined:
int mtx[5][3];
int (*row)[3] = &mtx[0]; // The object in question is still an int[3]
int *p = &(*row)[7]; // Why is this any better?
So why should this be defined?
int mtx[5][3];
int *p = &(&mtx[0][0])[7];
So what part of the C standard explicitly permits this? (Let's assume c99 for the sake of discussion.)
EDIT
Note that I have no doubt that this works fine in all compilers. What I'm querying is whether this is explicitly permitted by the standard.
All arrays (including multidimensional ones) are padding-free. Even if it's never explicitly mentioned, it can be inferred from sizeof rules.
Now, array subscription is a special case of pointer arithmetics, and C99 section 6.5.6, §8 states clearly that behaviour is only defined if the pointer operand and the resulting pointer lie in the same array (or one element past), which makes bounds-checking implementations of the C language possible.
This means that your example is, in fact, undefined behaviour. However, as most C implementations do not check bounds, it will work as expected - most compilers treat undefined pointer expressions like
mtx[0] + 5
identically to well-defined counterparts like
(int *)((char *)mtx + 5 * sizeof (int))
which is well-defined because any object (including the whole two-dimensional array) can always be treated as a one-dimensinal array of type char.
On further meditation on the wording of section 6.5.6, splitting out-of-bounds access into seemingly well-defined subexpression like
(mtx[0] + 3) + 2
reasoning that mtx[0] + 3 is a pointer to one element past the end of mtx[0] (making the first addition well-defined) and as well as a pointer to the first element of mtx[1] (making the second addition well-defined) is incorrect:
Even though mtx[0] + 3 and mtx[1] + 0 are guaranteed to compare equal (see section 6.5.9, §6), they are semantically different. For example, the former can't be dereferenced and thus does not point to an element of mtx[1].
The only obstacle to the kind of access you want to do is that objects of type int [5][3] and int [15] are not allowed to alias one another. Thus if the compiler is aware that a pointer of type int * points into one of the int [3] arrays of the former, it could impose array bounds restrictions that would prevent accessing anything outside that int [3] array.
You might be able to get around this issue by putting everything inside a union that contains both the int [5][3] array and the int [15] array, but I'm really unclear on whether the union hacks people use for type-punning are actually well-defined. This case might be slightly less problematic since you would not be type-punning individual cells, only the array logic, but I'm still not sure.
One special case that should be noted: if your type were unsigned char (or any char type), accessing the multi-dimensional array as a one-dimensional array would be perfectly well-defined. This is because the one-dimensional array of unsigned char that overlaps it is explicitly defined by the standard as the "representation" of the object, and is inherently allowed to alias it.
It is sure that there is no padding between the elements of an array.
There are provision for doing address computation in smaller size than the full address space. This could be used for instance in the huge mode of 8086 so that the segment part would not always be updated if the compiler knew that you couldn't cross a segment boundary. (It's too long ago for me to remind if the compilers I used took benefit of that or not).
With my internal model -- I'm not sure it is perfectly the same as the standard one and it is too painful to check, the information being distributed everywhere --
what you are doing in clearBottomRightElement is valid.
int *p = &foo.row[7]; is undefined
int i = mtx[0][5]; is undefined
int *p = &row[7]; doesn't compile (gcc agree with me)
int *p = &(&mtx[0][0])[7]; is in the gray zone (last time I checked in details something like this, I ended up by considering invalid C90 and valid C99, it could be the case here or I could have missed something).
My understanding of the C99 standard is that there is no requirement that multidimensional arrays must be laid out in a contiguous order in memory. Following the only relevant information I found in the standard (each dimension is guaranteed to be contiguous).
If you want to use the x[COLS*r + c] access, I suggest you stick to single dimension arrays.
Array subscripting
Successive subscript operators designate an element of a multidimensional array object.
If E is an n-dimensional array (n ≥ 2) with dimensions i × j × . . . × k, then E (used as
other than an lvalue) is converted to a pointer to an (n − 1)-dimensional array with
dimensions j × . . . × k. If the unary * operator is applied to this pointer explicitly, or
implicitly as a result of subscripting, the result is the pointed-to (n − 1)-dimensional array,
which itself is converted into a pointer if used as other than an lvalue. It follows from this
that arrays are stored in row-major order (last subscript varies fastest).
Array type
— An array type describes a contiguously allocated nonempty set of objects with a
particular member object type, called the element type.
36)
Array types are
characterized by their element type and by the number of elements in the array. An
array type is said to be derived from its element type, and if its element type is T , the
array type is sometimes called ‘‘array of T ’’. The construction of an array type from
an element type is called ‘‘array type derivation’’.
When declaring a function that accesses several consecutive values in memory, I usually use array arguments like
f(int a[4]);
It works fine for my purposes. However, I recently read the opinion of Linus Torvalds.
So I wonder if the array arguments are today considered obsolete? More particularly,
is there any case where the compiler can utilize this information (array size) to check out-of-bound access, or
is there any case where this technique brings some optimization opportunities?
In any case, what about pointers to arrays?
void f(int (*a)[4]);
Note that this form is not prone to "sizeof" mistakes. But what about efficiency in this case? I know that GCC generates the same code (link). Is that always so? And what about further optimization opportunities in this case?
If you write
void f(int a[4]);
that has exactly the same meaning to the compiler as if you wrote
void f(int *a);
This is why Linus has the opinion that he does. The [4] looks like it defines the expected size of the array, but it doesn't. Mismatches between what the code looks like it means and what it actually means are very bad when you're trying to maintain a large and complicated program.
(In general I advise people not to assume that Linus' opinions are correct. In this case I agree with him, but I wouldn't have put it so angrily.)
Since C99, there is a variation that does mean what it looks like it means:
void f(int a[static 4]);
That is, all callers of f are required to supply a pointer to an array of at least four ints; if they don't, the program has undefined behavior. This can help the optimizer, at least in principle (e.g. maybe it means the loop over a[i] inside f can be vectorized).
Your alternative construct
void f(int (*a)[4]);
gives the parameter a a different type ('pointer to array of 4 int' rather than 'pointer to int'). The array-notation equivalent of this type is
void f(int a[][4]);
Written that way, it should be immediately clear that that declaration is appropriate when the argument to f is a two-dimensional array whose inner size is 4, but not otherwise.
sizeof issues are another can of worms; my recommendation is to avoid needing to use sizeof on function arguments at almost any cost. Do not contort the parameter list of a function to make sizeof come out "right" inside the function; that makes it harder to call the function correctly, and you probably call the function a lot more times than you implement it.
Unless it is the operand of the sizeof or unary & operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element in the array.
When you pass an array expression as an argument to a function:
int arr[100];
...
foo( arr );
what the function actually receives is a pointer to the first element of the array, not a copy of the array. The behavior is exactly the same as if you had written
foo( &arr[0] );
There's a rule that function parameters of type T a[N] or T a[] are "adjusted" to T *a, so if your function declaration is
void foo( int a[100] )
it will be interpreted as though you wrote
void foo( int *a )
There are a couple of significant consequences of this:
Arrays are implicitly passed "by reference" to functions, so changes to the array contents in the function are reflected in the caller (unlike literally every other type);
You can't use sizeof to determine how many elements are in the passed array because there's no way to get that information from a pointer. If your function needs to know the physical size of the array in order to use it properly, then you must pass that length as a separate parameter1.
In my own code, I do not use array-style declarations in function parameter lists - what the function receives is a pointer, so I use pointer-style declarations. I can see the argument for using array-style declarations, mostly as a matter of documentation (this function is expecting an array of this size), but I think it's valuable to reinforce the pointer-ness of the parameter.
Note that you have the same problem with pointers to arrays - if I call
foo( &arr );
then the prototype for foo needs to be
void foo( int (*a)[100] );
But that's also the same prototype as if I had called it as
void bar[10][100];
foo( bar );
Just like you cannot know whether the parameter a points to a single int or the first in a sequence of ints, you can't know whether bar points to a single 100-element array, or to the first in a sequence of 100-element arrays.
This is why the gets function was deprecated in after C99 and removed from the standard library in C2011 - there's no way to tell it the size of the target buffer, so it will happily write input past the end of the array and clobber whatever follows. That's why it was such a popular malware exploit.
Consider this code snippet:
void foo(int a[], int b[]){
static_assert(sizeof(a) == sizeof(int*));
static_assert(sizeof(b) == sizeof(int*));
b = a;
printf("%d", b[1]);
assert(a == b); // This also works!
}
int a[3] = {[1] = 2}, b[1];
foo(a, b);
Output (no compilation error):
2
I can't get the point why b = a is valid. Even though arrays may decay to pointers, shouldn't they decay to const pointers (T * const)?
They can't.
Arrays cannot be assigned to. There are no arrays in the foo function. The syntax int a[] in a function parameter list means to declare that a has type "pointer to int". The behaviour is exactly the same as if the code were void foo(int *a, int *b). (C11 6.7.6.3/7)
It is valid to assign one pointer to another. The result is that both pointers point to the same location.
Even though arrays may decay to pointers, shouldn't they decay to const pointers (T * const)?
The pointer that results from array "decay" is an rvalue. The const qualifier is only meaningful for lvalues (C11 6.7.3/4). (The term "decay" refers to conversion of the argument, not the adjustment of the parameter).
Quoting C11, chapter §6.7.6.3, Function declarators (including prototypes)
A declaration of a parameter as ‘‘array of type’’ shall be adjusted to ‘‘qualified pointer to
type’’, where the type qualifiers (if any) are those specified within the [ and ] of the
array type derivation. [...]
So, a and b are actually pointers, not arrays.
There's no assignment to any array type happennning here, hence there's no problem with the code.
Yes, it would have made sense for array parameters declared with [] to be adjusted to const-qualified pointers. However, const did not exist when this behavior was established.
When the C language was being developed, it made sense to pass an array by passing its address, or, more specifically, the address of the first element. You certainly did not want to copy the entire array to pass it. Passing the address was an easy way to make the array known to the called function. (The semantics for the reference types we see in C++ had not been invented yet.) To make that easy for programmers, so that they could write foo(ArrayA, ArrayB) instead of foo(&Array[0], &ArrayB[0]), the mechanism of converting an array to a pointer to its first element was invented. (Per M.M. and The Development of the C Language by Dennis M. Ritchie, this notation for parameters already existed in C’s predecessor language, B.)
That is fine, you have hidden the conversion. But that is only where the function is called. In the called routine, the programmer who is thinking about passing an array is going to write void foo(int ArrayA[], int ArrayB[]). But since we are actually passing pointers, not arrays, these need to be changed to int *ArrayA and int *ArrayB. So the notion that parameters declared as arrays are automatically adjusted to pointers was created.
As you observe, this leaves the programmer able to assign values to the parameters, which changes the apparent base address of the array. It would have made sense for a parameter declared as int ArrayA[] to be adjusted to int * const ArrayA, so that the value of the parameter ArrayA could not be changed. Then it would act more like an array, whose address also cannot be changed, so this better fits the goal of pretending to pass arrays even though we are passing addresses.
However, at the time, const did not exist, so this was not possible, and nobody thought of inventing const at that time (or at least did work on it enough to get it adopted into the language).
Now there is a large amount of source code in the world that works with the non-const adjustment. Changing the specification of the C language now would cause problems with the existing code.
I think that it is because the former is an array of pointers to char and the latter is a pointer to an array of chars, and we need to properly specify the size of the object being pointed to for our function definition. In the former;
function(char * p_array[])
the size of the object being pointed to is already included (its a pointer to char), but the latter
function(char (*p_array)[])
needs the size of the array p_array points to as part of p_array's definition?
I'm at the stage where I've been thinking about this for too long and have just confused myself, someone please let me know if my reasoning is correct.
Both are valid in C but not C++. You would ordinarily be correct:
char *x[]; // array of pointers to char
char (*y)[]; // pointer to array of char
However, the arrays decay to pointers if they appear as function parameters. So they become:
char **x; // Changes to pointer to array of pointer to char
char (*y)[]; // No decay, since it's NOT an array, it's a pointer to an array
In an array type in C, one of the sizes is permitted to be unspecified. This must be the leftmost one (whoops, I said rightmost at first). So,
int valid_array[][5]; // Ok
int invalid_array[5][]; // Wrong
(You can chain them... but we seldom have reason to do so...)
int (*convoluted_array[][5])[][10];
There is a catch, and the catch is that an array type with [] in it is an incomplete type. You can pass around a pointer to an incomplete type but certain operations will not work, as they need a complete type. For example, this will not work:
void func(int (*x)[])
{
x[2][5] = 900; // Error
}
This is an error because in order to find the address of x[2], the compiler needs to know how big x[0] and x[1] are. But x[0] and x[1] have type int [] -- an incomplete type with no information about how big it is. This becomes clearer if you imagine what the "un-decayed" version of the type would be, which is int x[][] -- obviously invalid C. If you want to pass a two-dimensional array around in C, you have a few options:
Pass a one-dimensional array with a size parameter.
void func(int n, int x[])
{
x[2*n + 5] = 900;
}
Use an array of pointers to rows. This is somewhat clunky if you have genuine 2D data.
void func(int *x[])
{
x[2][5] = 900;
}
Use a fixed size.
void func(int x[][5])
{
x[2][5] = 900;
}
Use a variable length array (C99 only, so it probably doesn't work with Microsoft compilers).
// There's some funny syntax if you want 'x' before 'width'
void func(int n, int x[][n])
{
x[2][5] = 900;
}
This is a frequent problem area even for C veterans. Many languages lack intrinsic "out-of-the-box" support for real, variable size, multidimensional arrays (C++, Java, Python) although a few languages do have it (Common Lisp, Haskell, Fortran). You'll see a lot of code that uses arrays of arrays or that calculates array offsets manually.
NOTE:
The below answer was added when the Q was tagged C++, and it answers from a C++ perspective. With tagged changed to only C, both the mentioned samples are valid in C.
Yes, Your reasoning is correct.
If you try compiling the error given by compiler is:
parameter ‘p_array’ includes pointer to array of unknown bound ‘char []’
In C++ array sizes need to be fixed at compile time. C++ standard forbids Variable Lenght Array's(VLA) as well. Some compilers support that as an extension but that is non standard conforming.
Those two declarations are very different. In a function parameter declaration, a declarator of [] directly applied to the parameter name is completely equivalent to a *, so your first declaration is exactly the same in all respects as this:
function(char **p_array);
However, this does not apply recursively to parameter types. Your second parameter has type char (*)[], which is a pointer to an array of unknown size - it is a pointer to an incomplete type. You can happily declare variables with this type - the following is a valid variable declaration:
char (*p_array)[];
Just like a pointer to any other incomplete type, you cannot perform any pointer arithmetic on this variable (or your function parameter) - that's where you error arises. Note that the [] operator is specified as a[i] being identical to *(a+i), so that operator cannot be applied to your pointer. You can, of course, happily use it as a pointer, so this is valid:
void function(char (*p_array)[])
{
printf("p_array = %p\n", (void *)p_array);
}
This type is also compatible with a pointer to any other fixed-size array of char, so you can also do this:
void function(char (*p_array)[])
{
char (*p_a_10)[10] = p_array;
puts(*p_a_10);
}
...and even this:
void function(char (*p_array)[])
{
puts(*p_array);
}
(though there is precious little point in doing so: you might as well just declare the parameter with type char *).
Note that although *p_array is allowed, p_array[0] is not.
Because,
(1) function(char * p_array[])
is equivalent to char **p_array; i.e. a double pointer which is valid.
(2) function(char (*p_array)[])
You are right, that p_array is pointer to char array. But that needs to be of fixed size in the case when it appears as function argument. You need to provide the size and that will also become valid.
I imagine we all agree that it is considered idiomatic C to access a true multidimensional array by dereferencing a (possibly offset) pointer to its first element in a one-dimensional fashion, e.g.:
void clearBottomRightElement(int *array, int M, int N)
{
array[M*N-1] = 0; // Pretend the array is one-dimensional
}
int mtx[5][3];
...
clearBottomRightElement(&mtx[0][0], 5, 3);
However, the language-lawyer in me needs convincing that this is actually well-defined C! In particular:
Does the standard guarantee that the compiler won't put padding in-between e.g. mtx[0][2] and mtx[1][0]?
Normally, indexing off the end of an array (other than one-past the end) is undefined (C99, 6.5.6/8). So the following is clearly undefined:
struct {
int row[3]; // The object in question is an int[3]
int other[10];
} foo;
int *p = &foo.row[7]; // ERROR: A crude attempt to get &foo.other[4];
So by the same rule, one would expect the following to be undefined:
int mtx[5][3];
int (*row)[3] = &mtx[0]; // The object in question is still an int[3]
int *p = &(*row)[7]; // Why is this any better?
So why should this be defined?
int mtx[5][3];
int *p = &(&mtx[0][0])[7];
So what part of the C standard explicitly permits this? (Let's assume c99 for the sake of discussion.)
EDIT
Note that I have no doubt that this works fine in all compilers. What I'm querying is whether this is explicitly permitted by the standard.
All arrays (including multidimensional ones) are padding-free. Even if it's never explicitly mentioned, it can be inferred from sizeof rules.
Now, array subscription is a special case of pointer arithmetics, and C99 section 6.5.6, §8 states clearly that behaviour is only defined if the pointer operand and the resulting pointer lie in the same array (or one element past), which makes bounds-checking implementations of the C language possible.
This means that your example is, in fact, undefined behaviour. However, as most C implementations do not check bounds, it will work as expected - most compilers treat undefined pointer expressions like
mtx[0] + 5
identically to well-defined counterparts like
(int *)((char *)mtx + 5 * sizeof (int))
which is well-defined because any object (including the whole two-dimensional array) can always be treated as a one-dimensinal array of type char.
On further meditation on the wording of section 6.5.6, splitting out-of-bounds access into seemingly well-defined subexpression like
(mtx[0] + 3) + 2
reasoning that mtx[0] + 3 is a pointer to one element past the end of mtx[0] (making the first addition well-defined) and as well as a pointer to the first element of mtx[1] (making the second addition well-defined) is incorrect:
Even though mtx[0] + 3 and mtx[1] + 0 are guaranteed to compare equal (see section 6.5.9, §6), they are semantically different. For example, the former can't be dereferenced and thus does not point to an element of mtx[1].
The only obstacle to the kind of access you want to do is that objects of type int [5][3] and int [15] are not allowed to alias one another. Thus if the compiler is aware that a pointer of type int * points into one of the int [3] arrays of the former, it could impose array bounds restrictions that would prevent accessing anything outside that int [3] array.
You might be able to get around this issue by putting everything inside a union that contains both the int [5][3] array and the int [15] array, but I'm really unclear on whether the union hacks people use for type-punning are actually well-defined. This case might be slightly less problematic since you would not be type-punning individual cells, only the array logic, but I'm still not sure.
One special case that should be noted: if your type were unsigned char (or any char type), accessing the multi-dimensional array as a one-dimensional array would be perfectly well-defined. This is because the one-dimensional array of unsigned char that overlaps it is explicitly defined by the standard as the "representation" of the object, and is inherently allowed to alias it.
It is sure that there is no padding between the elements of an array.
There are provision for doing address computation in smaller size than the full address space. This could be used for instance in the huge mode of 8086 so that the segment part would not always be updated if the compiler knew that you couldn't cross a segment boundary. (It's too long ago for me to remind if the compilers I used took benefit of that or not).
With my internal model -- I'm not sure it is perfectly the same as the standard one and it is too painful to check, the information being distributed everywhere --
what you are doing in clearBottomRightElement is valid.
int *p = &foo.row[7]; is undefined
int i = mtx[0][5]; is undefined
int *p = &row[7]; doesn't compile (gcc agree with me)
int *p = &(&mtx[0][0])[7]; is in the gray zone (last time I checked in details something like this, I ended up by considering invalid C90 and valid C99, it could be the case here or I could have missed something).
My understanding of the C99 standard is that there is no requirement that multidimensional arrays must be laid out in a contiguous order in memory. Following the only relevant information I found in the standard (each dimension is guaranteed to be contiguous).
If you want to use the x[COLS*r + c] access, I suggest you stick to single dimension arrays.
Array subscripting
Successive subscript operators designate an element of a multidimensional array object.
If E is an n-dimensional array (n ≥ 2) with dimensions i × j × . . . × k, then E (used as
other than an lvalue) is converted to a pointer to an (n − 1)-dimensional array with
dimensions j × . . . × k. If the unary * operator is applied to this pointer explicitly, or
implicitly as a result of subscripting, the result is the pointed-to (n − 1)-dimensional array,
which itself is converted into a pointer if used as other than an lvalue. It follows from this
that arrays are stored in row-major order (last subscript varies fastest).
Array type
— An array type describes a contiguously allocated nonempty set of objects with a
particular member object type, called the element type.
36)
Array types are
characterized by their element type and by the number of elements in the array. An
array type is said to be derived from its element type, and if its element type is T , the
array type is sometimes called ‘‘array of T ’’. The construction of an array type from
an element type is called ‘‘array type derivation’’.