I imagine we all agree that it is considered idiomatic C to access a true multidimensional array by dereferencing a (possibly offset) pointer to its first element in a one-dimensional fashion, e.g.:
void clearBottomRightElement(int *array, int M, int N)
{
array[M*N-1] = 0; // Pretend the array is one-dimensional
}
int mtx[5][3];
...
clearBottomRightElement(&mtx[0][0], 5, 3);
However, the language-lawyer in me needs convincing that this is actually well-defined C! In particular:
Does the standard guarantee that the compiler won't put padding in-between e.g. mtx[0][2] and mtx[1][0]?
Normally, indexing off the end of an array (other than one-past the end) is undefined (C99, 6.5.6/8). So the following is clearly undefined:
struct {
int row[3]; // The object in question is an int[3]
int other[10];
} foo;
int *p = &foo.row[7]; // ERROR: A crude attempt to get &foo.other[4];
So by the same rule, one would expect the following to be undefined:
int mtx[5][3];
int (*row)[3] = &mtx[0]; // The object in question is still an int[3]
int *p = &(*row)[7]; // Why is this any better?
So why should this be defined?
int mtx[5][3];
int *p = &(&mtx[0][0])[7];
So what part of the C standard explicitly permits this? (Let's assume c99 for the sake of discussion.)
EDIT
Note that I have no doubt that this works fine in all compilers. What I'm querying is whether this is explicitly permitted by the standard.
All arrays (including multidimensional ones) are padding-free. Even if it's never explicitly mentioned, it can be inferred from sizeof rules.
Now, array subscription is a special case of pointer arithmetics, and C99 section 6.5.6, §8 states clearly that behaviour is only defined if the pointer operand and the resulting pointer lie in the same array (or one element past), which makes bounds-checking implementations of the C language possible.
This means that your example is, in fact, undefined behaviour. However, as most C implementations do not check bounds, it will work as expected - most compilers treat undefined pointer expressions like
mtx[0] + 5
identically to well-defined counterparts like
(int *)((char *)mtx + 5 * sizeof (int))
which is well-defined because any object (including the whole two-dimensional array) can always be treated as a one-dimensinal array of type char.
On further meditation on the wording of section 6.5.6, splitting out-of-bounds access into seemingly well-defined subexpression like
(mtx[0] + 3) + 2
reasoning that mtx[0] + 3 is a pointer to one element past the end of mtx[0] (making the first addition well-defined) and as well as a pointer to the first element of mtx[1] (making the second addition well-defined) is incorrect:
Even though mtx[0] + 3 and mtx[1] + 0 are guaranteed to compare equal (see section 6.5.9, §6), they are semantically different. For example, the former can't be dereferenced and thus does not point to an element of mtx[1].
The only obstacle to the kind of access you want to do is that objects of type int [5][3] and int [15] are not allowed to alias one another. Thus if the compiler is aware that a pointer of type int * points into one of the int [3] arrays of the former, it could impose array bounds restrictions that would prevent accessing anything outside that int [3] array.
You might be able to get around this issue by putting everything inside a union that contains both the int [5][3] array and the int [15] array, but I'm really unclear on whether the union hacks people use for type-punning are actually well-defined. This case might be slightly less problematic since you would not be type-punning individual cells, only the array logic, but I'm still not sure.
One special case that should be noted: if your type were unsigned char (or any char type), accessing the multi-dimensional array as a one-dimensional array would be perfectly well-defined. This is because the one-dimensional array of unsigned char that overlaps it is explicitly defined by the standard as the "representation" of the object, and is inherently allowed to alias it.
It is sure that there is no padding between the elements of an array.
There are provision for doing address computation in smaller size than the full address space. This could be used for instance in the huge mode of 8086 so that the segment part would not always be updated if the compiler knew that you couldn't cross a segment boundary. (It's too long ago for me to remind if the compilers I used took benefit of that or not).
With my internal model -- I'm not sure it is perfectly the same as the standard one and it is too painful to check, the information being distributed everywhere --
what you are doing in clearBottomRightElement is valid.
int *p = &foo.row[7]; is undefined
int i = mtx[0][5]; is undefined
int *p = &row[7]; doesn't compile (gcc agree with me)
int *p = &(&mtx[0][0])[7]; is in the gray zone (last time I checked in details something like this, I ended up by considering invalid C90 and valid C99, it could be the case here or I could have missed something).
My understanding of the C99 standard is that there is no requirement that multidimensional arrays must be laid out in a contiguous order in memory. Following the only relevant information I found in the standard (each dimension is guaranteed to be contiguous).
If you want to use the x[COLS*r + c] access, I suggest you stick to single dimension arrays.
Array subscripting
Successive subscript operators designate an element of a multidimensional array object.
If E is an n-dimensional array (n ≥ 2) with dimensions i × j × . . . × k, then E (used as
other than an lvalue) is converted to a pointer to an (n − 1)-dimensional array with
dimensions j × . . . × k. If the unary * operator is applied to this pointer explicitly, or
implicitly as a result of subscripting, the result is the pointed-to (n − 1)-dimensional array,
which itself is converted into a pointer if used as other than an lvalue. It follows from this
that arrays are stored in row-major order (last subscript varies fastest).
Array type
— An array type describes a contiguously allocated nonempty set of objects with a
particular member object type, called the element type.
36)
Array types are
characterized by their element type and by the number of elements in the array. An
array type is said to be derived from its element type, and if its element type is T , the
array type is sometimes called ‘‘array of T ’’. The construction of an array type from
an element type is called ‘‘array type derivation’’.
Related
I was playing around with some arrays and pointers in c and started wondering whether doing this would be undefined behavior.
int (*arr)[5] = malloc(sizeof(int[5][5]));
// Is this undefined behavior?
int val0 = arr[0][5];
// Rephrased, is it guaranteed it'll always have the same effect as this line?
int val1 = arr[1][0];
Thank you for any insights.
In C, what you're doing is undefined behavior.
The expression arr[0] has type int [5]. So the expression arr[0][5] dereferences one element past the end of the array arr[0], and dereferencing past the end of an array is undefined behavior.
Section 6.5.2.1p2 of the C standard regarding Array Subscripting states:
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).
And section 6.5.6p8 of the C standard regarding Additive Operators states:
When an expression that has integer type is added to or
subtracted from a pointer, the result has the type of the pointer
operand. If the pointer operand points to an element of an array
object, and the array is large enough, the result points to an element
offset from the original element such that the difference of the
subscripts of the resulting and original array elements equals the
integer expression. In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N
(equivalently,N+(P)) and (P)-N (where N has the value n)
point to, respectively, the i+n-th and i−n -th elements of the
array object, provided they exist. Moreover, if the
expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array
object, and if the expression Q points one past the last
element of an array object,the expression (Q)-1 points to the
last element of the array object. If both the pointer operand
and the result point to elements of the same array object,
or one past the last element of the array object, the evaluation
shall not produce an overflow; otherwise, the behavior is undefined.
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is
evaluated.
The part in bold specifies that the addition implicit in an array subscript may not result in a pointer more that one element past the end of an array, and that a pointer to one element past the end of an array may not be defererenced.
The fact that the array in question is itself a member of an array, meaning the elements of each subarray are continuous in memory, doesn't change this. Aggressive optimization settings in the compiler may note that it is undefined behavior to access past the end of the array and make optimizations based on this fact.
The Standard is clearly intended to avoid requiring that a compiler given something like:
int foo[5][10];
int test(int i)
{
foo[1][0] = 1;
foo[0][i] = 2;
return foo[1][0];
}
must reload the value of foo[1][0] to accommodate the possibility that the write to foo[0][i] might affect foo[1][0]. On the other hand, before the Standard was written, it would have been idiomatic to write something like:
void dump_array(int *p, int rows, int cols)
{
int i,j;
for (i=0; i<rows; i++)
{
for (j=0; j<cols; j++)
printf("%6d", *p++);
printf("\n");
}
}
int foo[5][10];
...
dump_array(foo[0], 5, 10);
and nothing in the published Rationale suggests that the authors had any intention of forbidding such constructs nor breaking code that used them. Indeed, the primary benefit of requiring that rows of an array be placed consecutively, even when adding padding would improve efficiency, is to allow such code to function.
At the time the Standard was written, when generating code for a function that received a pointer, compilers would treat the pointer as though it might identify some arbitrary part of some arbitrary larger object, without making any effort to know or care about what that enclosing object might be. They would thus, as a very popular form of "conforming language extension", support constructs like dump_array without regard for whether the Standard required them to do so, and consequently the authors of the Standard saw no reason to worry about when the Standard mandated such support. Instead, they left such matters as a Quality of Implementation issue over which the Standard could waive jurisdiction.
Unfortunately, because the authors of the Standard expected that compilers would treat the act of passing a pointer to a function as implicitly "laundering" it, the authors of the Standard saw no need to define any explicit method for laundering information about a pointer's enclosing objects in cases where it would be necessary for a function to treat a pointer identifying "raw" storage. Such distinctions didn't matter given the state of compiler technology in the 1980s, but may be quite relevant if e.g. code does something like:
int matrix[10][10];
void test2(int c)
{
matrix[4][0] = 1;
dump_array(matrix[0], 1, c);
matrix[4][0] = 2;
}
or
void test3(int r)
{
matrix[4][0] = 1;
dump_array((int*)matrix, r, 10);
matrix[4][0] = 2;
}
Depending upon what the functions is intending to do, having a compiler optimize out the first write to matrix[4][0] in one or both may improve efficiency, or it may cause the generated code to behave uselessly. Treating explicit pointer conversions as erasing type information, but treating array-to-pointer decay as retaining it, would allow programmers to achieve required semantics if they write code as in the second example, while allowing compilers to perform the relevant optimizations when source code is written as in the first example. Unfortunately, the Standard makes no distinctions, and maintainers of free compilers are loath to forego any "optimizations" they view the Standard as giving them, leaving the language with nothing but "hope for the best" semantics except on implementations that either refrain from cross-procedural optimizations or document what needs to be done to block them.
I imagine we all agree that it is considered idiomatic C to access a true multidimensional array by dereferencing a (possibly offset) pointer to its first element in a one-dimensional fashion, e.g.:
void clearBottomRightElement(int *array, int M, int N)
{
array[M*N-1] = 0; // Pretend the array is one-dimensional
}
int mtx[5][3];
...
clearBottomRightElement(&mtx[0][0], 5, 3);
However, the language-lawyer in me needs convincing that this is actually well-defined C! In particular:
Does the standard guarantee that the compiler won't put padding in-between e.g. mtx[0][2] and mtx[1][0]?
Normally, indexing off the end of an array (other than one-past the end) is undefined (C99, 6.5.6/8). So the following is clearly undefined:
struct {
int row[3]; // The object in question is an int[3]
int other[10];
} foo;
int *p = &foo.row[7]; // ERROR: A crude attempt to get &foo.other[4];
So by the same rule, one would expect the following to be undefined:
int mtx[5][3];
int (*row)[3] = &mtx[0]; // The object in question is still an int[3]
int *p = &(*row)[7]; // Why is this any better?
So why should this be defined?
int mtx[5][3];
int *p = &(&mtx[0][0])[7];
So what part of the C standard explicitly permits this? (Let's assume c99 for the sake of discussion.)
EDIT
Note that I have no doubt that this works fine in all compilers. What I'm querying is whether this is explicitly permitted by the standard.
All arrays (including multidimensional ones) are padding-free. Even if it's never explicitly mentioned, it can be inferred from sizeof rules.
Now, array subscription is a special case of pointer arithmetics, and C99 section 6.5.6, §8 states clearly that behaviour is only defined if the pointer operand and the resulting pointer lie in the same array (or one element past), which makes bounds-checking implementations of the C language possible.
This means that your example is, in fact, undefined behaviour. However, as most C implementations do not check bounds, it will work as expected - most compilers treat undefined pointer expressions like
mtx[0] + 5
identically to well-defined counterparts like
(int *)((char *)mtx + 5 * sizeof (int))
which is well-defined because any object (including the whole two-dimensional array) can always be treated as a one-dimensinal array of type char.
On further meditation on the wording of section 6.5.6, splitting out-of-bounds access into seemingly well-defined subexpression like
(mtx[0] + 3) + 2
reasoning that mtx[0] + 3 is a pointer to one element past the end of mtx[0] (making the first addition well-defined) and as well as a pointer to the first element of mtx[1] (making the second addition well-defined) is incorrect:
Even though mtx[0] + 3 and mtx[1] + 0 are guaranteed to compare equal (see section 6.5.9, §6), they are semantically different. For example, the former can't be dereferenced and thus does not point to an element of mtx[1].
The only obstacle to the kind of access you want to do is that objects of type int [5][3] and int [15] are not allowed to alias one another. Thus if the compiler is aware that a pointer of type int * points into one of the int [3] arrays of the former, it could impose array bounds restrictions that would prevent accessing anything outside that int [3] array.
You might be able to get around this issue by putting everything inside a union that contains both the int [5][3] array and the int [15] array, but I'm really unclear on whether the union hacks people use for type-punning are actually well-defined. This case might be slightly less problematic since you would not be type-punning individual cells, only the array logic, but I'm still not sure.
One special case that should be noted: if your type were unsigned char (or any char type), accessing the multi-dimensional array as a one-dimensional array would be perfectly well-defined. This is because the one-dimensional array of unsigned char that overlaps it is explicitly defined by the standard as the "representation" of the object, and is inherently allowed to alias it.
It is sure that there is no padding between the elements of an array.
There are provision for doing address computation in smaller size than the full address space. This could be used for instance in the huge mode of 8086 so that the segment part would not always be updated if the compiler knew that you couldn't cross a segment boundary. (It's too long ago for me to remind if the compilers I used took benefit of that or not).
With my internal model -- I'm not sure it is perfectly the same as the standard one and it is too painful to check, the information being distributed everywhere --
what you are doing in clearBottomRightElement is valid.
int *p = &foo.row[7]; is undefined
int i = mtx[0][5]; is undefined
int *p = &row[7]; doesn't compile (gcc agree with me)
int *p = &(&mtx[0][0])[7]; is in the gray zone (last time I checked in details something like this, I ended up by considering invalid C90 and valid C99, it could be the case here or I could have missed something).
My understanding of the C99 standard is that there is no requirement that multidimensional arrays must be laid out in a contiguous order in memory. Following the only relevant information I found in the standard (each dimension is guaranteed to be contiguous).
If you want to use the x[COLS*r + c] access, I suggest you stick to single dimension arrays.
Array subscripting
Successive subscript operators designate an element of a multidimensional array object.
If E is an n-dimensional array (n ≥ 2) with dimensions i × j × . . . × k, then E (used as
other than an lvalue) is converted to a pointer to an (n − 1)-dimensional array with
dimensions j × . . . × k. If the unary * operator is applied to this pointer explicitly, or
implicitly as a result of subscripting, the result is the pointed-to (n − 1)-dimensional array,
which itself is converted into a pointer if used as other than an lvalue. It follows from this
that arrays are stored in row-major order (last subscript varies fastest).
Array type
— An array type describes a contiguously allocated nonempty set of objects with a
particular member object type, called the element type.
36)
Array types are
characterized by their element type and by the number of elements in the array. An
array type is said to be derived from its element type, and if its element type is T , the
array type is sometimes called ‘‘array of T ’’. The construction of an array type from
an element type is called ‘‘array type derivation’’.
I heard from a friend that two dimensional arrays in C are only supported syntactically.
He told me to better use float arr[M * N] instead of float[M][N] because C compilers like the gcc can't guarantee that on every system/platform the data lies in series within the memory.
I want to use this as an argument in my master thesis but I don't have any referrence.
So first question:
Is that right what he's saying?
Second question:
Do you know if there is a book or an article where to find this statement?
Thanks + Regards
No, he's wrong.
Look at the C standard. Some relevant bits (bold emphasis mine):
6.2.5 Types ¶20
An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type.
6.7.6.2 Array declarators ¶3 (note 142)
When several "array of" specifications are adjacent, a multidimensional array is declared.
6.5.2.1 Array subscripting ¶3
Successive subscript operators designate an element of a multidimensional array object. ... It follows from this that arrays are stored in row-major order (last subscript varies fastest).
And perhaps most explicitly, the example in 6.5.2.1 Array subscripting ¶4:
EXAMPLE Consider the array object defined by the declaration
int x[3][5];
Here x is a 3 × 5 array of ints; more precisely, x is an array of three element objects, each of which is an array of five ints. In the expression x[i], which is equivalent to (*((x)+(i))), x is first converted to a pointer to the initial array of five ints. Then i is adjusted according to the type of x, which conceptually entails multiplying i by the size of the object to which the pointer points, namely an array of five int objects. The results are added and indirection is applied to yield an array of five ints. When used in the expression x[i][j], that array is in turn converted to a pointer to the first of the ints, so x[i][j] yields an int.
Multidimensional arrays in C are just "arrays of arrays". They work fine and are 100% defined by the standard.
You may also find it helpful to read Section 6, Arrays and Pointers in the comp.lang.c FAQ.
The issue is a bit more subtle than the other answers make it sound:
While multi-dimensional arrays are (semantically, possibly not physically) contiguous, pointer arithmetics is only defined if you stay within the bounds of the array your pointer originally referenced (actually, you can go 1 element past the upper bound, but only if you don't dereference).
This means that language semantics forbid walking through a multi-dimensional array from start to end, and a bounds-checking implementation of the C language (which are possible in principle but rarely seen in the wild for performance reasons) could raise a segfault, print a diagnostic or make demons fly from your nose whenever you cross a sub-array's boundary.
I'm not sure if compilers use this information for optimization purposes, but in principle, they could. For example, if you have
float *p = &arr[2][3];
float *q = &arr[5][9];
then p + x and q + y should never alias, regardless of the values of x and y.
Section 6.2.5.20 requires that arrays be contiguously allocated. This applies as much to an array of arrays as it does to a single dimensional array.
Your friend is simply wrong.
Built-in multi-dimensional arrays in C are implemented through index translation. This means that, for example, a 3D array T a[M][N][K] is implemented as a 1D array T a_impl[M * N * K], with multi-dimensional access a[i][j][k] being implicitly translated into the single-dimensional access a_impl[((i * N) + j) * K + k]. The language specification does not explicitly describe this implementation, however the requirements mandate it pretty much directly.
Taking this into account, it is not clear why your friend would tell you to use float arr[M * N] explicitly instead of relying on the implicit implementation of the same thing by the compiler.
The situation that might make you to consider float arr[M * N] approach is when both M and N are run-time values and your compiler does not support variable-length arrays (or you for some reason do not want to use them). In such cases the built-in support for multidimensional arrays is no longer applicable, since it relies on all sizes (except the first one) being compile-time constants. Maybe this is what your friend had in mind.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Is the “struct hack” technically undefined behavior?
Normally accessing an array beyond its end is undefined behavior in C. For example:
int foo[1];
foo[5] = 1; //Undefined behavior
Is it still undefined behavior if I know that the memory area after the end of the array has been allocated, with malloc or on the stack? Here is an example:
#include <stdio.h>
#include <stdlib.h>
typedef struct
{
int len;
int data[1];
} MyStruct;
int main(void)
{
MyStruct *foo = malloc(sizeof(MyStruct) + sizeof(int) * 10);
foo->data[5] = 1;
}
I have seen this patten used in several places to make a variable length struct, and it seems to work in practice. Is it technically undefined behavior?
What you are describing is affectionately called "the struct hack". It's not clear if it's completely okay, but it was and is widely used.
As of late (C99), it has started to be replaced by the "flexible array member", where you're allowed to put an int data[]; field if it's the last field in the struct.
Under 6.5.6 Additive operators:
Semantics
8 - [...] If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. [...] If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
If the memory is allocated by malloc then:
7.22.3 Memory management functions
1 - [...] The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). The lifetime of an allocated object extends from the allocation until the deallocation.
This does not however countenance the use of such memory without an appropriate cast, so for MyStruct as defined above only the declared members of the object can be used. This is why flexible array members (6.7.2.1:18) were added.
Also note that appendix J.2 Undefined behavior calls out array access:
1 - The behavior is undefined in the following circumstances: [...]
— Addition or subtraction of a pointer into, or just beyond, an array object and an
integer type produces a result that does not point into, or just beyond, the same array
object.
— Addition or subtraction of a pointer into, or just beyond, an array object and an
integer type produces a result that points just beyond the array object and is used as
the operand of a unary * operator that is evaluated.
— An array subscript is out of range, even if an object is apparently accessible with the
given subscript (as in the lvalue expression a[1][7] given the declaration int
a[4][5]).
So, as you note this would be undefined behaviour:
MyStruct *foo = malloc(sizeof(MyStruct) + sizeof(int) * 10);
foo->data[5] = 1;
However, you would be allowed to do the following:
MyStruct *foo = malloc(sizeof(MyStruct) + sizeof(int) * 10);
((int *) foo)[(offsetof(MyStruct, data) / sizeof(int)) + 5] = 1;
C++ is laxer in this regard; 3.9.2 Compound types [basic.compound] has:
3 - [...] If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.
This makes sense considered in the light of C's more aggressive optimisation opportunities for pointers, e.g. with the restrict qualifier.
The C99 rationale document talks about this in section 6.7.2.1.
A new feature of C99: There is a common idiom known as the “struct hack” for creating a structure containing a variable-size array:
...
The validity of this construct has always been questionable. In the response to one Defect Report, the Committee decided that it was undefined behavior because the array p->items contains only one item, irrespective of whether the space exists. An alternative construct was
suggested: make the array size larger than the largest possible case (for example, using int items[INT_MAX];), but this approach is also undefined for other reasons.
The Committee felt that, although there was no way to implement the “struct hack” in C89, it was nonetheless a useful facility. Therefore the new feature of “flexible array members” was introduced. Apart from the empty brackets, and the removal of the “-1” in the malloc call,
this is used in the same way as the struct hack, but is now explicitly valid code.
The struct hack is undefined behavior, as supported not only be the C specification itself (I'm sure there are citations in the other answers) but the committee has even recorded its opinion.
So the answer is yes, it is undefined behavior according to the standard document, but it is well defined according to the de facto C standard. I imagine most compiler writers are intimately familiar with the hack. From GCC's tree-vrp.c:
/* Accesses after the end of arrays of size 0 (gcc
extension) and 1 are likely intentional ("struct
hack"). */
I think there's a good chance you might even find the struct hack in compiler test suites.
I have the following code:
int *pa;
int a[3] = {1, 2, 3};
Why pa = a is ok, but a = pa is not allowed?
The main difference is that type of a is still an array but it just decays into a pointer when you do pa=a;. pa will now point to the first element of the array not the entire array itself. When you do a=pa it doesnot make any sense as you are trying point a datatype which is holding 3 integers to a type which can point only to a single integer.
Note: This is purely conceptual, this is not the actual reason why this happens.
I like to think of pointer assignment like OOP & Inheritance.
Imagine int * is a generic object. Now, think of int [] as an object that inherits from int *.
As you can see, you can cast down from int [] to int *, but not casting upwards.
Well, the simple answer is that the language definition simply doesn't allow it - it's a design choice.
Chapter and verse:
6.5.16 Assignment operators
...
Constraints
2 An assignment operator shall have a modifiable lvalue as its left operand.
And what's a modifiable lvalue?
6.3.2.1 Lvalues, arrays, and function designators
1 An lvalue is an expression with an object type or an incomplete type other than void;53)
if an lvalue does not designate an object when it is evaluated, the behavior is undefined.
When an object is said to have a particular type, the type is specified by the lvalue used to
designate the object. A modifiable lvalue is an lvalue that does not have array type, does
not have an incomplete type, does not have a const-qualified type, and if it is a structure
or union, does not have any member (including, recursively, any member or element of
all contained aggregates or unions) with a const-qualified type.
...
53) The name ‘‘lvalue’’ comes originally from the assignment expression E1 = E2, in which the left
operand E1 is required to be a (modifiable) lvalue. It is perhaps better considered as representing an
object ‘‘locator value’’. What is sometimes called ‘‘rvalue’’ is in this International Standard described
as the ‘‘value of an expression’’.
Emphasis added.
Array expressions in C are treated differently than most other expressions. The reason for this is explained in an article Dennis Ritchie wrote about the development of the C language:
NB existed so briefly that no full description of it was written. It supplied the types int and char, arrays of them, and pointers to them, declared in a style typified by
int i, j;
char c, d;
int iarray[10];
int ipointer[];
char carray[10];
char cpointer[];
The semantics of arrays remained exactly as in B and BCPL: the declarations of iarray and carray create cells dynamically initialized with a value pointing to the first of a sequence of 10 integers and characters respectively. The declarations for ipointer and cpointer omit the size, to assert that no storage should be allocated automatically. Within procedures, the language's interpretation of the pointers was identical to that of the array variables: a pointer declaration created a cell differing from an array declaration only in that the programmer was expected to assign a referent, instead of letting the compiler allocate the space and initialize the cell.
Values stored in the cells bound to array and pointer names were the machine addresses, measured in bytes, of the corresponding storage area. Therefore, indirection through a pointer implied no run-time overhead to scale the pointer from word to byte offset. On the other hand, the machine code for array subscripting and pointer arithmetic now depended on the type of the array or the pointer: to compute iarray[i] or ipointer+i implied scaling the addend i by the size of the object referred to.
These semantics represented an easy transition from B, and I experimented with them for some months. Problems became evident when I tried to extend the type notation, especially to add structured (record) types. Structures, it seemed, should map in an intuitive way onto memory in the machine, but in a structure containing an array, there was no good place to stash the pointer containing the base of the array, nor any convenient way to arrange that it be initialized. For example, the directory entries of early Unix systems might be described in C as
struct {
int inumber;
char name[14];
};
I wanted the structure not merely to characterize an abstract object but also to describe a collection of bits that might be read from a directory. Where could the compiler hide the pointer to name that the semantics demanded? Even if structures were thought of more abstractly, and the space for pointers could be hidden somehow, how could I handle the technical problem of properly initializing these pointers when allocating a complicated object, perhaps one that specified structures containing arrays containing structures to arbitrary depth?
The solution constituted the crucial jump in the evolutionary chain between typeless BCPL and typed C. It eliminated the materialization of the pointer in storage, and instead caused the creation of the pointer when the array name is mentioned in an expression. The rule, which survives in today's C, is that values of array type are converted, when they appear in expressions, into pointers to the first of the objects making up the array.
This invention enabled most existing B code to continue to work, despite the underlying shift in the language's semantics. The few programs that assigned new values to an array name to adjust its origin—possible in B and BCPL, meaningless in C—were easily repaired. More important, the new language retained a coherent and workable (if unusual) explanation of the semantics of arrays, while opening the way to a more comprehensive type structure.
It's a good article, and well worth reading if you're interested in the "whys" of C.