How to understand the syntax of C multidimensional arrays?

How to understand the syntax of C multidimensional arrays? - arrays

My intuition when I see this int array_name[x][y]; is an array of y arrays i.e. array_name[x] is one element out of y such elements. But turns out it's not so [in fact it's the opposite(?) ]
The guides/tutorials seem hellbent on bring matrices to explain this which makes it specific to 2D arrays. I'm looking to understand a general array_name[w][x] ... [n] syntax.
Note: fine, syntax is syntax, and this is how C defines it, okay. Then, is it true that array_name[w][x] ... [n] is simply an array of w elements each of which is array_name[x] ... [n]? But even this is not entirely correct because int a[][3] = {1,2,3,4,5,6,7}; is valid even though the RHS contains a number of elements not divisible by 3.

int x[5][3];
declares x as an array with 5 elements. Each of these elements is an array with 3 int. You're correct so far.
But you should compile with -Wall -Wextra. Look here:
k.c:2:16: warning: missing braces around initializer [-Wmissing-braces]
2 | int a[][3] = {1,2,3,4,5,6,7};
| ^
| { }{ }{}
It's valid to initialize it this way, but the more proper way of initializing it is:
int a[][3] = {{1,2,3},{4,5,6},{7,0,0}};
This is much more readable. The zeros are not needed. If you initialize one single element, all other elements will be zeroed.
One more thing is that you can go out of bounds without actually going out of bounds with multi dimensional arrays. DO NOTE THAT EVEN IF THIS IS LIKELY TO WORK, IT'S UNDEFINED BEHAVIOR, SO DON'T DO IT!
a[1][4] = (*a+1)[4]=*(*a+1)+4)
This is because [] is simply syntactic sugar for pointer arithmetic. So if you have declared T x[5][3]; for some type T, then x[1][1] will point to the same element as x[0][4]
But as I said, it's UB. Read more about it here

The way to read a multidimensional array declaration like
int arr[N][M];
is as an N-element array of M-element arrays of int. Each arr[i] has type int [M].
We can get there using substitution. Let's start with a simple object declaration:
T a;
a is an instance of something which we call T. Now replace T with the array type R [N]:
R a[N];
Important thing to note - since the [] operator is postfix in both expressions and declarations, when we substitute T with R [N] the [N] goes to the rightmost side of the declarator a, giving us R a[N]; the importance of this will be clear on the next round of substitution.
So now a is an N-element array of something, and that something is type R. Now we replace R with another array type, int [M]:
int a[N][M];
Again, since the [] operator is postfix, we need to add it to the rightmost side of the declarator a[N] when doing the substitution, giving us a[N][M]. a is still an N-element array of something, it's just now that something is "M-element array of int". Hence, a is an N-element array of M-element arrays of int.
But even this is not entirely correct because int a[][3] = {1,2,3,4,5,6,7}; is valid even though the RHS contains a number of elements not divisible by 3.
That's covered here:
6.7.9 Initialization
...
21 If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
C 2011 Online Draft
So that initializer is interpreted as:
int a[][3] = {{1,2,3},{4,5,6},{7,0,0}};

Related

Pick one string from an Array of 4 strings in C [duplicate]

It is stated here that
The term modifiable lvalue is used to emphasize that the lvalue allows the designated object to be changed as well as examined. The following object types are lvalues, but not modifiable lvalues:
An array type
An incomplete type
A const-qualified type
A structure or union type with one of its members qualified as a const type
Because these lvalues are not modifiable, they cannot appear on the left side of an assignment statement.
Why array type object is not modifiable? Isn't it correct to write
int i = 5, a[10] = {0};
a[i] = 1;
?
And also, what is an incomplete type?

Assume the declaration
int a[10];
then all of the following are true:
the type of the expression a is "10-element array of int"; except when a is the operand of the sizeof or unary & operators, the expression will be converted to an expression of type "pointer to int" and its value will be the address of the first element in the array;
the type of the expression a[i] is int; it refers to the integer object stored as the i'th element of the array;
The expression a may not be the target of an assignment because C does not treat arrays like other variables, so you cannot write something like a = b or a = malloc(n * sizeof *a) or anything like that.
You'll notice I keep emphasizing the word "expression". There's a difference between the chunk of memory we set aside to hold 10 integers and the symbols (expressions) we use to refer to that chunk of memory. We can refer to it with the expression a. We can also create a pointer to that array:
int (*ptr)[10] = &a;
The expression *ptr also has type "10-element array of int", and it refers to the same chunk of memory that a refers to.
C does not treat array expressions (a, *ptr) like expressions of other types, and one of the differences is that an expression of array type may not be the target of an assignment. You cannot reassign a to refer to a different array object (same for the expression *ptr). You may assign a new value to a[i] or (*ptr)[i] (change the value of each array element), and you may assign ptr to point to a different array:
int b[10], c[10];
.....
ptr = &b;
.....
ptr = &c;
As for the second question...
An incomplete type lacks size information; declarations like
struct foo;
int bar[];
union bletch;
all create incomplete types because there isn't enough information for the compiler to determine how much storage to set aside for an object of that type. You cannot create objects of incomplete type; for example, you cannot declare
struct foo myFoo;
unless you complete the definition for struct foo. However, you can create pointers to incomplete types; for example, you could declare
struct foo *myFooPtr;
without completing the definition for struct foo because a pointer just stores the address of the object, and you don't need to know the type's size for that. This makes it possible to define self-referential types like
struct node {
T key; // for any type T
Q val; // for any type Q
struct node *left;
struct node *right;
};
The type definition for struct node isn't complete until we hit that closing }. Since we can declare a pointer to an incomplete type, we're okay. However, we could not define the struct as
struct node {
... // same as above
struct node left;
struct node right;
};
because the type isn't complete when we declare the left and right members, and also because each left and right member would each contain left and right members of their own, each of which would contain left and right members of their own, and on and on and on.
That's great for structs and unions, but what about
int bar[];
???
We've declared the symbol bar and indicated that it will be an array type, but the size is unknown at this point. Eventually we'll have to define it with a size, but this way the symbol can be used in contexts where the array size isn't meaningful or necessary. Don't have a good, non-contrived example off the top of my head to illustrate this, though.
EDIT
Responding to the comments here, since there isn't going to be room in the comments section for what I want to write (I'm in a verbose mood this evening). You asked:
Does it mean every variables are expression?
It means that any variable can be an expression, or part of an expression. Here's how the language standard defines the term expression:
6.5 Expressions
1 An expression is a sequence of operators and operands that specifies computation of a
value, or that designates an object or a function, or that generates side effects, or that
performs a combination thereof.
For example, the variable a all by itself counts as an expression; it designates the array object we defined to hold 10 integer values. It also evaluates to the address of the first element of the array. The variable a can also be part of a larger expression like a[i]; the operator is the subscript operator [] and the operands are the variables a and i. This expression designates a single member of the array, and it evaluates to the value currectly stored in that member. That expression in turn can be part of a larger expression like a[i] = 0.
And also let me clear that, in the declaration int a[10], does a[] stands for array type
Yes, exactly.
In C, declarations are based on the types of expressions, rather than the types of objects. If you have a simple variable named y that stores an int value, and you want to access that value, you simply use y in an expression, like
x = y;
The type of the expression y is int, so the declaration of y is written
int y;
If, on the other hand, you have an array of int values, and you want to access a specific element, you would use the array name and an index along with the subscript operator to access that value, like
x = a[i];
The type of the expression a[i] is int, so the declaration of the array is written as
int arr[N]; // for some value N.
The "int-ness" of arr is given by the type specifier int; the "array-ness" of arr is given by the declarator arr[N]. The declarator gives us the name of the object being declared (arr) along with some additional type information not given by the type specifier ("is an N-element array"). The declaration "reads" as
a -- a
a[N] -- is an N-element array
int a[N]; -- of int
EDIT2
And after all that, I still haven't told you the real reason why array expressions are non-modifiable lvalues. So here's yet another chapter to this book of an answer.
C didn't spring fully formed from the mind of Dennis Ritchie; it was derived from an earlier language known as B (which was derived from BCPL).1 B was a "typeless" language; it didn't have different types for integers, floats, text, records, etc. Instead, everything was simply a fixed length word or "cell" (essentially an unsigned integer). Memory was treated as a linear array of cells. When you allocated an array in B, such as
auto V[10];
the compiler allocated 11 cells; 10 contiguous cells for the array itself, plus a cell that was bound to V containing the location of the first cell:
+----+
V: | | -----+
+----+ |
... |
+----+ |
| | <----+
+----+
| |
+----+
| |
+----+
| |
+----+
...
When Ritchie was adding struct types to C, he realized that this arrangement was causing him some problems. For example, he wanted to create a struct type to represent an entry in a file or directory table:
struct {
int inumber;
char name[14];
};
He wanted the structure to not just describe the entry in an abstract manner, but also to represent the bits in the actual file table entry, which didn't have an extra cell or word to store the location of the first element in the array. So he got rid of it - instead of setting aside a separate location to store the address of the first element, he wrote C such that the address of the first element would be computed when the array expression was evaluated.
This is why you can't do something like
int a[N], b[N];
a = b;
because both a and b evaluate to pointer values in that context; it's equivalent to writing 3 = 4. There's nothing in memory that actually stores the address of the first element in the array; the compiler simply computes it during the translation phase.
1. This is all taken from the paper The Development of the C Language

The term "lvalue of array type" literally refers to the array object as an lvalue of array type, i.e. array object as a whole. This lvalue is not modifiable as a whole, since there's no legal operation that can modify it as a whole. In fact, the only operations you can perform on an lvalue of array type are: unary & (address of), sizeof and implicit conversion to pointer type. None of these operations modify the array, which is why array objects are not modifiable.
a[i] does not work with lvalue of array type. a[i] designates an int object: the i-th element of array a. The semantics of this expression (if spelled out explicitly) is: *((int *) a + i). The very first step - (int *) a - already converts the lvalue of array type into an rvalue of type int *. At this point the lvalue of array type is out of the picture for good.
Incomplete type is a type whose size is not [yet] known. For example: a struct type that has been declared but not defined, an array type with unspecified size, the void type.

An incomplete type is a type which is declared but not defined, for example struct Foo;.
You can always assign to individual array elements (assuming they are not const). But you cannot assign something to the whole array.
C and C++ are quite confusing in that something like int a[10] = {0, 1, 2, 3}; is not an assignment but an initialization even though it looks pretty much like an assignment.
This is OK (initialization):
int a[10] = {0, 1, 2, 3};
This is does not work in C/C++:
int a[10];
a = {0, 1, 2, 3};

Assuming a is an array of ints, a[10] isn't an array. It is an int.
a = {0} would be illegal.

Remember that the value of an array is actually the address (pointer) of its first element. This address can't be modified. So
int a[10], b[10];
a = b
is illegal.
It has of course nothing to do with modifying the content of the array as in a[1] = 3

Bracket order in multidimensional arrays

int data[3][5];
is a 3-element array of 5-element arrays.
Why? Intuitively for me if int[3] is a 3-element array and int[3][5] Should be a 5-element array of 3-elements arrays.

The intuition should come from the indexing convention - since it is an array of arrays, first index is selecting the element which is an array, the second index is selecting the element of the selected array. That is:
data[2][4] will select element number 4 of the array number 2 (mind the zero-basing).
Now the definition of such an array seems to be a bit counter-intuitive as you noted, but apparently it is this way just to be consistent with indexing syntax, otherwise it will be much more confusing.

C doesn't always work in an intuitive way because of things like the spiral rule, though maybe you're mis-applying it here.
As with any language, you need to accept the syntax for what it is, not what you think it is, or you'll constantly be fighting with the language on a semantic level.
Tools like cdecl explain it as:
declare data as array 3 of array 5 of int

This falls out of C's concept of declarators. The pointer-ness, array-ness, or function-ness of a declaration is specified in the declarator, while the type-ness is specified with a type specifier1:
int *p; // *p is the declarator
double arr[N][M]; // arr[N][M] is the declarator
char *foo( int x ); // *foo( int x ) is the declarator
This allows you to create arbitrarily complex types in a compact manner:
int *(*foo(void))[M][N];
foo is a function taking no parameters, returning a pointer to an M-element array of N-element arrays of pointer to int.
Thus, the actual type of an object or function is specified through the combination of the type specifier (and any qualifiers) and the declarator.
Unfortunately, "compact" is just another way of saying "eye-stabby". Declarations like that can be hard to read and understand. It does mean that things like multi-dimensional array declarations read kind of "backwards":
+---------------------------------+
| |
v |
type arr -> array-of -> array-of -+
^
|
start here
But, if you work it through, it does make sense. Let's start with some arbitrary type T. We declare an array of T as
T arr[N];
Thus, arr is an N-element array of T. Now we replace T with an array type R [M]. This gives us
R arr[N][M];
arr is still an N-element array of something, and that something is R [M], which is why we write arr[N][M] instead of arr[M][N].
And there are also storage-class qualifiers, type qualifiers, etc., but we won't go into those here.

Row size not declared in 2d array in C

In some programs involving 2d array, written in C, I noted that row size is not mentioned and the compiler is also not throwing any error regarding this. But when I tried this by mentioning the row size but not the column size, the compiler throws an error.
Eg:
int arr[][5]; // correct
int arr[5][]; //compiler throws error
What's the reason?

We can define a 2-D array in C as:
A [][n];
where n is some constant
We must include the number of columns in the array because this specifies the size of each row. The two dimensional array can be viewed as an array of rows.Once the compiler knows the size of a row in the array (which is defined by the value in the second square bracket, n here), it is able to correctly determine the beginning of each row.
In other words,it is needed to compute the relative offset of the item you're actually accessing.
We have offset = (row*colwidth + col)
The offsets are computed by the compiler using the size of the row, which happens to be the number/count of the columns.

6.7.6.2 Array declarators
Constraints
1 In addition to optional type qualifiers and the keyword static, the [ and ] may delimit
an expression or *. If they delimit an expression (which specifies the size of an array), the
expression shall have an integer type. If the expression is a constant expression, it shall
have a value greater than zero. The element type shall not be an incomplete or function
type. The optional type qualifiers and the keyword static shall appear only in a
declaration of a function parameter with an array type, and then only in the outermost
array type derivation.
...
Semantics
...
4 If the size is not present, the array type is an incomplete type...
C 2011 Online Draft
Emphasis added. Given an array declaration
T a[];
the type of a is incomplete - it's "unknown size array of T". However, per the constraint above, T itself must be a complete type. If T is an array type, its size must be known, a la R [N]:
R a[][N]; // a is an unknown-size array of N-element arrays of R
This is why the compiler accepts
int arr[][5];
since, while we don't yet know how many elements will be in arr, we know how big each of those elements will be (5 * sizeof (int)). Note that arr must be given a size before it can actually be used. The converse,
int arr[5][];
says that arr is a 5-element array of unknown-size arrays of int. We know how many elements we need, but we don't know how big those elements are going to be.
Now, why does C make this restriction? I can't provide an authoritative answer for that, but I suspect it has to do with the relationship between array and pointer operations in C. Remember that the expression a[i] is defined as *(a + i) - that is, take the address a and offset i elements (not bytes!!) from that address and dereference the result. That only works if the size of the element type is known.
It should be possible to model an array of N elements of unknown size, but I suspect that such a model is cumbersome enough that it's more trouble to implement than it's worth.

One-dimensional access to a multidimensional array: is it well-defined behaviour?

I imagine we all agree that it is considered idiomatic C to access a true multidimensional array by dereferencing a (possibly offset) pointer to its first element in a one-dimensional fashion, e.g.:
void clearBottomRightElement(int *array, int M, int N)
{
array[M*N-1] = 0; // Pretend the array is one-dimensional
}
int mtx[5][3];
...
clearBottomRightElement(&mtx[0][0], 5, 3);
However, the language-lawyer in me needs convincing that this is actually well-defined C! In particular:
Does the standard guarantee that the compiler won't put padding in-between e.g. mtx[0][2] and mtx[1][0]?
Normally, indexing off the end of an array (other than one-past the end) is undefined (C99, 6.5.6/8). So the following is clearly undefined:
struct {
int row[3]; // The object in question is an int[3]
int other[10];
} foo;
int *p = &foo.row[7]; // ERROR: A crude attempt to get &foo.other[4];
So by the same rule, one would expect the following to be undefined:
int mtx[5][3];
int (*row)[3] = &mtx[0]; // The object in question is still an int[3]
int *p = &(*row)[7]; // Why is this any better?
So why should this be defined?
int mtx[5][3];
int *p = &(&mtx[0][0])[7];
So what part of the C standard explicitly permits this? (Let's assume c99 for the sake of discussion.)
EDIT
Note that I have no doubt that this works fine in all compilers. What I'm querying is whether this is explicitly permitted by the standard.

All arrays (including multidimensional ones) are padding-free. Even if it's never explicitly mentioned, it can be inferred from sizeof rules.
Now, array subscription is a special case of pointer arithmetics, and C99 section 6.5.6, §8 states clearly that behaviour is only defined if the pointer operand and the resulting pointer lie in the same array (or one element past), which makes bounds-checking implementations of the C language possible.
This means that your example is, in fact, undefined behaviour. However, as most C implementations do not check bounds, it will work as expected - most compilers treat undefined pointer expressions like
mtx[0] + 5
identically to well-defined counterparts like
(int *)((char *)mtx + 5 * sizeof (int))
which is well-defined because any object (including the whole two-dimensional array) can always be treated as a one-dimensinal array of type char.
On further meditation on the wording of section 6.5.6, splitting out-of-bounds access into seemingly well-defined subexpression like
(mtx[0] + 3) + 2
reasoning that mtx[0] + 3 is a pointer to one element past the end of mtx[0] (making the first addition well-defined) and as well as a pointer to the first element of mtx[1] (making the second addition well-defined) is incorrect:
Even though mtx[0] + 3 and mtx[1] + 0 are guaranteed to compare equal (see section 6.5.9, §6), they are semantically different. For example, the former can't be dereferenced and thus does not point to an element of mtx[1].

The only obstacle to the kind of access you want to do is that objects of type int [5][3] and int [15] are not allowed to alias one another. Thus if the compiler is aware that a pointer of type int * points into one of the int [3] arrays of the former, it could impose array bounds restrictions that would prevent accessing anything outside that int [3] array.
You might be able to get around this issue by putting everything inside a union that contains both the int [5][3] array and the int [15] array, but I'm really unclear on whether the union hacks people use for type-punning are actually well-defined. This case might be slightly less problematic since you would not be type-punning individual cells, only the array logic, but I'm still not sure.
One special case that should be noted: if your type were unsigned char (or any char type), accessing the multi-dimensional array as a one-dimensional array would be perfectly well-defined. This is because the one-dimensional array of unsigned char that overlaps it is explicitly defined by the standard as the "representation" of the object, and is inherently allowed to alias it.

It is sure that there is no padding between the elements of an array.
There are provision for doing address computation in smaller size than the full address space. This could be used for instance in the huge mode of 8086 so that the segment part would not always be updated if the compiler knew that you couldn't cross a segment boundary. (It's too long ago for me to remind if the compilers I used took benefit of that or not).
With my internal model -- I'm not sure it is perfectly the same as the standard one and it is too painful to check, the information being distributed everywhere --
what you are doing in clearBottomRightElement is valid.
int *p = &foo.row[7]; is undefined
int i = mtx[0][5]; is undefined
int *p = &row[7]; doesn't compile (gcc agree with me)
int *p = &(&mtx[0][0])[7]; is in the gray zone (last time I checked in details something like this, I ended up by considering invalid C90 and valid C99, it could be the case here or I could have missed something).

My understanding of the C99 standard is that there is no requirement that multidimensional arrays must be laid out in a contiguous order in memory. Following the only relevant information I found in the standard (each dimension is guaranteed to be contiguous).
If you want to use the x[COLS*r + c] access, I suggest you stick to single dimension arrays.
Array subscripting
Successive subscript operators designate an element of a multidimensional array object.
If E is an n-dimensional array (n ≥ 2) with dimensions i × j × . . . × k, then E (used as
other than an lvalue) is converted to a pointer to an (n − 1)-dimensional array with
dimensions j × . . . × k. If the unary * operator is applied to this pointer explicitly, or
implicitly as a result of subscripting, the result is the pointed-to (n − 1)-dimensional array,
which itself is converted into a pointer if used as other than an lvalue. It follows from this
that arrays are stored in row-major order (last subscript varies fastest).
Array type
— An array type describes a contiguously allocated nonempty set of objects with a
particular member object type, called the element type.
36)
Array types are
characterized by their element type and by the number of elements in the array. An
array type is said to be derived from its element type, and if its element type is T , the
array type is sometimes called ‘‘array of T ’’. The construction of an array type from
an element type is called ‘‘array type derivation’’.

Equivalent C declarations

Are
int (*x)[10];
and
int x[10];
equivalent?
According to the "Clockwise Spiral" rule, they parse to different C declarations.
For the click-weary:
The ``Clockwise/Spiral Rule'' By David
Anderson
There is a technique known as the
``Clockwise/Spiral Rule'' which
enables any C programmer to parse in
their head any C declaration!
There are three simple steps to follow:
1. Starting with the unknown element, move in a spiral/clockwise direction;
when ecountering the following elements replace them with the
corresponding english statements:
[X] or []
=> Array X size of... or Array undefined size of...
(type1, type2)
=> function passing type1 and type2 returning...
*
=> pointer(s) to...
2. Keep doing this in a spiral/clockwise direction until all tokens have been covered.
3. Always resolve anything in parenthesis first!

Follow this simple process when reading declarations:
Start at the variable name (or
innermost construct if no identifier
is present. Look right without jumping
over a right parenthesis; say what you
see. Look left again without jumping
over a parenthesis; say what you see.
Jump out a level of parentheses if
any. Look right; say what you see.
Look left; say what you see. Continue
in this manner until you say the
variable type or return type.
So:
int (*x)[10];
x is a pointer to an array of 10 ints
int x[10];
x is an array of 10 ints
int *x[10];
x is an array of 10 pointers to ints

They are not equal. in the first case x is a pointer to an array of 10 integers, in the second case x is an array of 10 integers.
The two types are different. You can see they're not the same thing by checking sizeof in the two cases.

I tend to follow The Precedence Rule for Understanding C Declarations which is given very nicely in the book Expert C Programming - Deep C Secrets by Peter van der Linden
A - Declarations are read by starting with the name and then reading in
precedence order.
B - The precedence, from high to low, is:
B.1 parentheses grouping together parts of a declaration
B.2 the postfix operators:
parentheses () indicating a function, and
square brackets [] indicating an array.
B.3 the prefix operator: the asterisk denoting "pointer to".
C If a const and/or volatile keyword is next to a type specifier (e.g. int,
long, etc.) it applies to the type specifier.
Otherwise the const and/or volatile keyword
applies to the pointer asterisk on its immediate left.

For me, it's easier to remember the rule as absent any explicit grouping, () and [] bind before *. Thus, for a declaration like
T *a[N];
the [] bind before the *, so a is an N-element array of pointer. Breaking it down in steps:
a -- a
a[N] -- is an N-element array
*a[N] -- of pointer
T *a[N] -- to T.
For a declaration like
T (*a)[N];
the parens force the * to bind before the [], so
a -- a
(*a) -- is a pointer
(*a)[N] -- to an N-element array
T (*a)[N] -- of T
It's still the clockwise/spiral rule, just expressed in a more compact manner.

No. First one declares an array of 10 int pointers and second one declares an array of 10 ints.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight