It is stated here that
The term modifiable lvalue is used to emphasize that the lvalue allows the designated object to be changed as well as examined. The following object types are lvalues, but not modifiable lvalues:
An array type
An incomplete type
A const-qualified type
A structure or union type with one of its members qualified as a const type
Because these lvalues are not modifiable, they cannot appear on the left side of an assignment statement.
Why array type object is not modifiable? Isn't it correct to write
int i = 5, a[10] = {0};
a[i] = 1;
?
And also, what is an incomplete type?
Assume the declaration
int a[10];
then all of the following are true:
the type of the expression a is "10-element array of int"; except when a is the operand of the sizeof or unary & operators, the expression will be converted to an expression of type "pointer to int" and its value will be the address of the first element in the array;
the type of the expression a[i] is int; it refers to the integer object stored as the i'th element of the array;
The expression a may not be the target of an assignment because C does not treat arrays like other variables, so you cannot write something like a = b or a = malloc(n * sizeof *a) or anything like that.
You'll notice I keep emphasizing the word "expression". There's a difference between the chunk of memory we set aside to hold 10 integers and the symbols (expressions) we use to refer to that chunk of memory. We can refer to it with the expression a. We can also create a pointer to that array:
int (*ptr)[10] = &a;
The expression *ptr also has type "10-element array of int", and it refers to the same chunk of memory that a refers to.
C does not treat array expressions (a, *ptr) like expressions of other types, and one of the differences is that an expression of array type may not be the target of an assignment. You cannot reassign a to refer to a different array object (same for the expression *ptr). You may assign a new value to a[i] or (*ptr)[i] (change the value of each array element), and you may assign ptr to point to a different array:
int b[10], c[10];
.....
ptr = &b;
.....
ptr = &c;
As for the second question...
An incomplete type lacks size information; declarations like
struct foo;
int bar[];
union bletch;
all create incomplete types because there isn't enough information for the compiler to determine how much storage to set aside for an object of that type. You cannot create objects of incomplete type; for example, you cannot declare
struct foo myFoo;
unless you complete the definition for struct foo. However, you can create pointers to incomplete types; for example, you could declare
struct foo *myFooPtr;
without completing the definition for struct foo because a pointer just stores the address of the object, and you don't need to know the type's size for that. This makes it possible to define self-referential types like
struct node {
T key; // for any type T
Q val; // for any type Q
struct node *left;
struct node *right;
};
The type definition for struct node isn't complete until we hit that closing }. Since we can declare a pointer to an incomplete type, we're okay. However, we could not define the struct as
struct node {
... // same as above
struct node left;
struct node right;
};
because the type isn't complete when we declare the left and right members, and also because each left and right member would each contain left and right members of their own, each of which would contain left and right members of their own, and on and on and on.
That's great for structs and unions, but what about
int bar[];
???
We've declared the symbol bar and indicated that it will be an array type, but the size is unknown at this point. Eventually we'll have to define it with a size, but this way the symbol can be used in contexts where the array size isn't meaningful or necessary. Don't have a good, non-contrived example off the top of my head to illustrate this, though.
EDIT
Responding to the comments here, since there isn't going to be room in the comments section for what I want to write (I'm in a verbose mood this evening). You asked:
Does it mean every variables are expression?
It means that any variable can be an expression, or part of an expression. Here's how the language standard defines the term expression:
6.5 Expressions
1 An expression is a sequence of operators and operands that specifies computation of a
value, or that designates an object or a function, or that generates side effects, or that
performs a combination thereof.
For example, the variable a all by itself counts as an expression; it designates the array object we defined to hold 10 integer values. It also evaluates to the address of the first element of the array. The variable a can also be part of a larger expression like a[i]; the operator is the subscript operator [] and the operands are the variables a and i. This expression designates a single member of the array, and it evaluates to the value currectly stored in that member. That expression in turn can be part of a larger expression like a[i] = 0.
And also let me clear that, in the declaration int a[10], does a[] stands for array type
Yes, exactly.
In C, declarations are based on the types of expressions, rather than the types of objects. If you have a simple variable named y that stores an int value, and you want to access that value, you simply use y in an expression, like
x = y;
The type of the expression y is int, so the declaration of y is written
int y;
If, on the other hand, you have an array of int values, and you want to access a specific element, you would use the array name and an index along with the subscript operator to access that value, like
x = a[i];
The type of the expression a[i] is int, so the declaration of the array is written as
int arr[N]; // for some value N.
The "int-ness" of arr is given by the type specifier int; the "array-ness" of arr is given by the declarator arr[N]. The declarator gives us the name of the object being declared (arr) along with some additional type information not given by the type specifier ("is an N-element array"). The declaration "reads" as
a -- a
a[N] -- is an N-element array
int a[N]; -- of int
EDIT2
And after all that, I still haven't told you the real reason why array expressions are non-modifiable lvalues. So here's yet another chapter to this book of an answer.
C didn't spring fully formed from the mind of Dennis Ritchie; it was derived from an earlier language known as B (which was derived from BCPL).1 B was a "typeless" language; it didn't have different types for integers, floats, text, records, etc. Instead, everything was simply a fixed length word or "cell" (essentially an unsigned integer). Memory was treated as a linear array of cells. When you allocated an array in B, such as
auto V[10];
the compiler allocated 11 cells; 10 contiguous cells for the array itself, plus a cell that was bound to V containing the location of the first cell:
+----+
V: | | -----+
+----+ |
... |
+----+ |
| | <----+
+----+
| |
+----+
| |
+----+
| |
+----+
...
When Ritchie was adding struct types to C, he realized that this arrangement was causing him some problems. For example, he wanted to create a struct type to represent an entry in a file or directory table:
struct {
int inumber;
char name[14];
};
He wanted the structure to not just describe the entry in an abstract manner, but also to represent the bits in the actual file table entry, which didn't have an extra cell or word to store the location of the first element in the array. So he got rid of it - instead of setting aside a separate location to store the address of the first element, he wrote C such that the address of the first element would be computed when the array expression was evaluated.
This is why you can't do something like
int a[N], b[N];
a = b;
because both a and b evaluate to pointer values in that context; it's equivalent to writing 3 = 4. There's nothing in memory that actually stores the address of the first element in the array; the compiler simply computes it during the translation phase.
1. This is all taken from the paper The Development of the C Language
The term "lvalue of array type" literally refers to the array object as an lvalue of array type, i.e. array object as a whole. This lvalue is not modifiable as a whole, since there's no legal operation that can modify it as a whole. In fact, the only operations you can perform on an lvalue of array type are: unary & (address of), sizeof and implicit conversion to pointer type. None of these operations modify the array, which is why array objects are not modifiable.
a[i] does not work with lvalue of array type. a[i] designates an int object: the i-th element of array a. The semantics of this expression (if spelled out explicitly) is: *((int *) a + i). The very first step - (int *) a - already converts the lvalue of array type into an rvalue of type int *. At this point the lvalue of array type is out of the picture for good.
Incomplete type is a type whose size is not [yet] known. For example: a struct type that has been declared but not defined, an array type with unspecified size, the void type.
An incomplete type is a type which is declared but not defined, for example struct Foo;.
You can always assign to individual array elements (assuming they are not const). But you cannot assign something to the whole array.
C and C++ are quite confusing in that something like int a[10] = {0, 1, 2, 3}; is not an assignment but an initialization even though it looks pretty much like an assignment.
This is OK (initialization):
int a[10] = {0, 1, 2, 3};
This is does not work in C/C++:
int a[10];
a = {0, 1, 2, 3};
Assuming a is an array of ints, a[10] isn't an array. It is an int.
a = {0} would be illegal.
Remember that the value of an array is actually the address (pointer) of its first element. This address can't be modified. So
int a[10], b[10];
a = b
is illegal.
It has of course nothing to do with modifying the content of the array as in a[1] = 3
Related
Assume following Code:
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char **argv)
{
int arrayXYZ[10];
int i;
int *pi;
int intVar;
for (i=0; i<10; i++){
arrayXYZ[i] = i;
}
pi = arrayXYZ; // Reference 1
pi++; // Reference 2
arrayXYZ++; // Reference 3
arrayXYZ = pi; // Reference 4
}
Reference 1 is correct: pi points to first element in arrayXYZ -> *pi = 0
Reference 2 is correct: element to which pi points is incremented -> *pi = 1
Reference 3 is not correct: I am not completely sure why. Every integer needs 4 bits of memory. Hence, we cannot increment the address of the head of the array by just one? Assume, we had a char array with sizeof(char)=1 -> Would the head of the array point to the next bucket?
Reference 4 is not correct: I am not completely sure why. Why cannot the head of the array point to the address to which pi points?
Thanks for all clarifications!
I am a new member, so if my question doesn't follow the Stackoverflow guidelines, feel free to tell me how I can improve my next questions!
arrayXYZ++;
This is equivalent to:
arrayXYZ += 1;
which is equivalent to:
arrayXYZ = arrayXYZ + 1;
This is not allowed because the C language does not allow it. An array can not be assigned to.
arrayXYZ = pi;
This fails for the same reason. An array can not be assigned to.
The other assignments work because you are allowed to assign to a pointer.
Also keep in mind that arrays and pointers are distinct datatypes. In C, there are circumstances where arrays decay into a pointer to their first element for convenience purposes. Which is why this works:
pi = arrayXYZ;
However, this is just an automatic conversion, so that you don't have to write:
pi = &arrayXYZ[0];
This automatic conversion does not mean that arrays are the same thing as pointers.
From C11 standard §6.3.2.1 (N1570)
An lvalue is an expression (with an object type other than void) that potentially designates an object;64) if an lvalue does not designate an object when it is evaluated, the behavior is undefined. When an object is said to have a particular type, the type is specified by the lvalue used to designate the object. A modifiable lvalue is an lvalue that does not have array type, does not have an incomplete type, does not have a const- qualified type, and if it is a structure or union, does not have any member (including, recursively, any member or element of all contained aggregates or unions) with a const- qualified type.
And also From §6.5.2.4
The operand of the postfix increment or decrement operator shall have atomic, qualified, or unqualified real or pointer type, and shall be a modifiable lvalue.
As pointed out here these are the reasons why those statements are illegal. Same way for assignment operation the left one has to be modifiable. Here it is not. That's why the problem.
Now to explain why the other two works - there is a thing called array decay. Array in most situations (exceptions are when used in operand of &, sizeof etc) are converted to pointer to the first element of the array and that pointer is being assigned to the pi. This is modifiable. And that's why you can easily apply ++ over it.
For example,
int x[10];
int i = 0;
x = &i; //error occurs!
According to C - A Reference Manual, an array name cannot be an lvalue. Thus, x cannot be an lvalue. But, what is the reason the array name cannot be an lvalue? For example, why does an error occur in the third line?
Your reference is incorrect. An array can be an lvalue (but not a modifiable lvalue), and an "array name" (identifier) is always an lvalue.
Take your example:
int x[10];
int i = 0;
x = &i; //error occurs!
Apply C11 6.5.1, paragraph 2:
An identifier is a primary expression, provided it has been declared
as designating an object (in which case it is an lvalue) ...
We see that x is a primary expression and is an lvalue, because it has previously been declared as designating an array object.
However, the C language rules state that an array expression in various contexts, including the left-hand-side of an assignment expression, are converted to a pointer which points at the first element of the array and is not an lvalue, even if the array was. Specifically:
Except when it is the operand of the sizeof operator, the _Alignof operator, or the
unary & operator, or is a string literal used to initialize an array, an expression that has
type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points
to the initial element of the array object and is not an lvalue. If the array object has
register storage class, the behavior is undefined.
(C11 6.3.2.1 paragraph 3).
The pointer which is the result of the conversion specified above is not an lvalue because an lvalue designates an object, and there is no suitable object holding the pointer value; the array object holds the elements of the array, not a pointer to those elements.
The example you use in your question implies that you understand that an array expression decays (is converted to) a pointer value, but I think you are failing to recognize that after the conversion, the pointer value and the array are two different things. The pointer is not an lvalue; the array might be (and in your example, it is). Whether or not arrays are lvalues in fact has no bearing on your example; it is the pointer value that you are trying to assign to.
If you were to ask instead: Why do arrays decay to pointers when they are on the left-hand-side of an assignment operator? - then I suspect that there is no particularly good answer. C just doesn't allow assignment to arrays, historically.
Array names are non-modifiable lvalues in C.:)
Arrays are named extents of memory where their elements are placed. So you may not substitute one extent of memory for another extent of memory. Each extent of memory initially allocated for an array declaration has its own unique name. Each array name is bounded with its own extent of memory.
An array is an lvalue, however it is a non-modifiable lvalue.
It most likely has to do with compatibility of types. For example, you can do this:
struct ss {
char c[10];
};
...
struct ss s1 = { { "hello" } };
struct ss s2 = s1;
But not this:
char s1[10] = "hello";
char s2[10] = s1;
It's true that array names yield pointer values in many contexts. But so does the & operator, and you don't expect that to be assignable.
int i = 42;
int *j = malloc(sizeof *j);
&i = j; /* obviously wrong */
int a[] = {1,2,3};
&a[0] = j; /* also obviously wrong */
a = j; /* same as the previous line! */
So when learning the relationship between arrays and pointers, remember that a is usually the same as &a[0] and then you won't think lvalue-ness is an exception to the rule - it follows the rule perfectly.
I'm trying to swap two entries in a struct.
This is the struct:
struct hdr {
uint8_t ether_dhost[6];
uint8_t ether_shost[6];
}
When I try the save these values in temporary arrays, I get this error on this line:
uint8_t original_dhost[6];
original_dhost = ethernet_hdr->ether_dhost;
incompatible types when assigning to type 'uint8_t[6]' from type
'uint8_t *'
so instead I try using a pointer rather than an array:
uint8_t *original_dhost;
Then I get no error, but when I try to assign to the ethernet_hdr->ether_dhost, I get this error:
ethernet_hdr->ether_shost = original_dhost;
incompatible types when assigning to type ‘uint8_t[6]’ from type ‘uint8_t *’
How can I avoid the first error above? Specifically, why does the compiler say the field is 'uint8_t *' when I declare it as an array?
ether_dhost is an array. You can't copy to or from it using a simple assignment statement.
Your first error comes because ethernet_hdr->ether_dhost resolves to the address of the first element (a uint8_t pointer), but you can't assign it's value to a new array.
You need to use memcpy (or a loop) to copy all the elements:
uint8_t original_dhost[6];
memcpy(original_dhost,ethernet_hdr->ether_dhost,sizeof(original_dhost));
There are several issues at play here.
First of all, an array expression may not be the target of an assignment. You cannot write something like
uint8_t original_dhost[6];
original_dhost = ethernet_hdr->ether_dhost;
because the expression original_dhost is not a modifiable lvalue. There are reasons for this, which will become apparent below. To copy the contents of one array to another, you will either need to copy each element individually, or use the memcpy library function:
memcpy( original_dhost, ethernet_hdr->ether_dhost, sizeof original_dhost );
Except when it is the operand of the sizeof or unary & operators, or is a string literal being used to initialize an array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element of the array. The result is not a modifiable lvalue; that is, it cannot be the target of an assignment.
In the statement
original_dhost = ethernet_hdr->ether_dhost;
the expression ethernet_hdr->ether_dhost has type "6-element array of uint8_t"; since it is not the operand of either the sizeof or unary & operators, it is converted to an expression of type "pointer to uint8_t", or uint8_t *. This type is not compatible with uint8_t [6], hence the first error. The second error is the same problem, you've just reversed the players involved.
So why not simply convert the left hand side of the assignment to a pointer as well and let the assignment succeed? Time for a short history lesson.
C was derived from an earlier language called B, which was a "typeless" language; all data were stored in fixed-size words, or "cells", regardless of whether the data were being used to represent integers, real values, text, whatever. Memory was treated as a linear array of cells. When you declared an array in B, such as
auto arr[N];
the compiler would set aside N+1 cells; N cells for the array, and an additional cell that stored the offset to the first element of the array, which would be bound to the symbol arr, like so:
+---+
arr: | | ---+
+---+ |
... |
+---+ |
arr[0]: | | <--+
+---+
arr[1]: | |
+---+
arr[2]: | |
+---+
...
+---+
arr[N-1]: | |
+---+
As in C, subscript operations like arr[i] were computed as *(arr + i); you added the value i to the offset value stored in arr, then dereferenced the result.
Dennis Ritchie initially kept B's array semantics as he was developing C, but he ran into a problem when he started adding the struct type to the language. He wanted the struct contents to map directly onto memory; an example he gives is of a file system entry, like
struct {
int inode;
char name[14];
};
He wanted the struct to contain a 2-byte integer value immediately followed by a 0-terminated string, but he couldn't figure out what to do with the pointer to the name array: should it be stored as part of the struct, or stored separately? If separately, where should it be stored?
He solved the problem by getting rid of the array pointer altogether; instead of setting aside storage for a pointer to the first element of the array, he designed the language so that the pointer value would be computed from the array expression itself. Thus, in C, when you declare an array like
T arr[N];
only N elements of type T are allocated:
+---+
arr[0]: | |
+---+
arr[1]: | |
+---+
arr[2]: | |
+---+
...
+---+
arr[N-1]: | |
+---+
There's no separate storage bound to the symbol arr. This is why the expressions &arr and arr both yield the same value (the address of the first element in the array), even though the two expressions have different types (T (*)[N] and T *, respectively). And this is why an array expression may not be the target of an assignment; there's nothing to assign a value to.
For no particular reason (besides some historical accident), in C you cannot assign arrays directly; you have to copy elements one by one (or use memcpy).
memcpy(ethernet_hdr->ether_shost,original_dhost,sizeof(original_dhost));
An array is a pointer to a fixed-number of contiguous objects. So to achieve your desired move, use memcpy.
Struct entry declares a field of type array, but compiler says it's a pointer
Because arrays almost always decay (are implicitly converted) into pointers when being referred to.
when I try to assign to the ethernet_hdr->ether_dhost, I get this error:
Sure. Arrays are not modifiable lvalues; if you want to "assign" them, you can't do that directly. Use memcpy() instead.
The compiler is not saying ethernet_hdr->ether_dhostt is a 'uint8_t *.
It is saying that the right-hand-side of the = is a uint8_t * as that is how arrays are converted when used as an R-value.
C11 6.3.2.1 3 "Except when it is the operand of the sizeof operator, ..., or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue. ..."
Had OP performed the following, the expected result of 6 would occur for ethernet_hdr->ether_dhost is an array and not a pointer.
printf("%zu\n", sizeof(ethernet_hdr->ether_dhost));
OP can avoid this a number of ways.
Other have mentioned using memcpy().
Assignment with arrays does not work, but assignment with structures do. By enclosing each array in a structure, the whole structure may be assigned.
There is a pre-C89 method, maybe-non standard way too. Not detailed here.
.
#include <stdio.h>
#include <stdlib.h>
struct U6 {
uint8_t u[6];
};
struct hdr {
struct U6 ether_dhost;
struct U6 ether_shost;
};
struct hdr *ethernet_hdr;
void foo3(void) {
struct U6 original_dhost = {{ 1,2,3,4,5,6}};
ethernet_hdr = malloc(sizeof(*ethernet_hdr));
ethernet_hdr->ether_dhost.u[1] = 7;
// Copy structure
original_dhost = ethernet_hdr->ether_dhost;
printf("%u\n", original_dhost.u[1]);
}
int main() {
foo3();
return 0;
}
It is stated here that
The term modifiable lvalue is used to emphasize that the lvalue allows the designated object to be changed as well as examined. The following object types are lvalues, but not modifiable lvalues:
An array type
An incomplete type
A const-qualified type
A structure or union type with one of its members qualified as a const type
Because these lvalues are not modifiable, they cannot appear on the left side of an assignment statement.
Why array type object is not modifiable? Isn't it correct to write
int i = 5, a[10] = {0};
a[i] = 1;
?
And also, what is an incomplete type?
Assume the declaration
int a[10];
then all of the following are true:
the type of the expression a is "10-element array of int"; except when a is the operand of the sizeof or unary & operators, the expression will be converted to an expression of type "pointer to int" and its value will be the address of the first element in the array;
the type of the expression a[i] is int; it refers to the integer object stored as the i'th element of the array;
The expression a may not be the target of an assignment because C does not treat arrays like other variables, so you cannot write something like a = b or a = malloc(n * sizeof *a) or anything like that.
You'll notice I keep emphasizing the word "expression". There's a difference between the chunk of memory we set aside to hold 10 integers and the symbols (expressions) we use to refer to that chunk of memory. We can refer to it with the expression a. We can also create a pointer to that array:
int (*ptr)[10] = &a;
The expression *ptr also has type "10-element array of int", and it refers to the same chunk of memory that a refers to.
C does not treat array expressions (a, *ptr) like expressions of other types, and one of the differences is that an expression of array type may not be the target of an assignment. You cannot reassign a to refer to a different array object (same for the expression *ptr). You may assign a new value to a[i] or (*ptr)[i] (change the value of each array element), and you may assign ptr to point to a different array:
int b[10], c[10];
.....
ptr = &b;
.....
ptr = &c;
As for the second question...
An incomplete type lacks size information; declarations like
struct foo;
int bar[];
union bletch;
all create incomplete types because there isn't enough information for the compiler to determine how much storage to set aside for an object of that type. You cannot create objects of incomplete type; for example, you cannot declare
struct foo myFoo;
unless you complete the definition for struct foo. However, you can create pointers to incomplete types; for example, you could declare
struct foo *myFooPtr;
without completing the definition for struct foo because a pointer just stores the address of the object, and you don't need to know the type's size for that. This makes it possible to define self-referential types like
struct node {
T key; // for any type T
Q val; // for any type Q
struct node *left;
struct node *right;
};
The type definition for struct node isn't complete until we hit that closing }. Since we can declare a pointer to an incomplete type, we're okay. However, we could not define the struct as
struct node {
... // same as above
struct node left;
struct node right;
};
because the type isn't complete when we declare the left and right members, and also because each left and right member would each contain left and right members of their own, each of which would contain left and right members of their own, and on and on and on.
That's great for structs and unions, but what about
int bar[];
???
We've declared the symbol bar and indicated that it will be an array type, but the size is unknown at this point. Eventually we'll have to define it with a size, but this way the symbol can be used in contexts where the array size isn't meaningful or necessary. Don't have a good, non-contrived example off the top of my head to illustrate this, though.
EDIT
Responding to the comments here, since there isn't going to be room in the comments section for what I want to write (I'm in a verbose mood this evening). You asked:
Does it mean every variables are expression?
It means that any variable can be an expression, or part of an expression. Here's how the language standard defines the term expression:
6.5 Expressions
1 An expression is a sequence of operators and operands that specifies computation of a
value, or that designates an object or a function, or that generates side effects, or that
performs a combination thereof.
For example, the variable a all by itself counts as an expression; it designates the array object we defined to hold 10 integer values. It also evaluates to the address of the first element of the array. The variable a can also be part of a larger expression like a[i]; the operator is the subscript operator [] and the operands are the variables a and i. This expression designates a single member of the array, and it evaluates to the value currectly stored in that member. That expression in turn can be part of a larger expression like a[i] = 0.
And also let me clear that, in the declaration int a[10], does a[] stands for array type
Yes, exactly.
In C, declarations are based on the types of expressions, rather than the types of objects. If you have a simple variable named y that stores an int value, and you want to access that value, you simply use y in an expression, like
x = y;
The type of the expression y is int, so the declaration of y is written
int y;
If, on the other hand, you have an array of int values, and you want to access a specific element, you would use the array name and an index along with the subscript operator to access that value, like
x = a[i];
The type of the expression a[i] is int, so the declaration of the array is written as
int arr[N]; // for some value N.
The "int-ness" of arr is given by the type specifier int; the "array-ness" of arr is given by the declarator arr[N]. The declarator gives us the name of the object being declared (arr) along with some additional type information not given by the type specifier ("is an N-element array"). The declaration "reads" as
a -- a
a[N] -- is an N-element array
int a[N]; -- of int
EDIT2
And after all that, I still haven't told you the real reason why array expressions are non-modifiable lvalues. So here's yet another chapter to this book of an answer.
C didn't spring fully formed from the mind of Dennis Ritchie; it was derived from an earlier language known as B (which was derived from BCPL).1 B was a "typeless" language; it didn't have different types for integers, floats, text, records, etc. Instead, everything was simply a fixed length word or "cell" (essentially an unsigned integer). Memory was treated as a linear array of cells. When you allocated an array in B, such as
auto V[10];
the compiler allocated 11 cells; 10 contiguous cells for the array itself, plus a cell that was bound to V containing the location of the first cell:
+----+
V: | | -----+
+----+ |
... |
+----+ |
| | <----+
+----+
| |
+----+
| |
+----+
| |
+----+
...
When Ritchie was adding struct types to C, he realized that this arrangement was causing him some problems. For example, he wanted to create a struct type to represent an entry in a file or directory table:
struct {
int inumber;
char name[14];
};
He wanted the structure to not just describe the entry in an abstract manner, but also to represent the bits in the actual file table entry, which didn't have an extra cell or word to store the location of the first element in the array. So he got rid of it - instead of setting aside a separate location to store the address of the first element, he wrote C such that the address of the first element would be computed when the array expression was evaluated.
This is why you can't do something like
int a[N], b[N];
a = b;
because both a and b evaluate to pointer values in that context; it's equivalent to writing 3 = 4. There's nothing in memory that actually stores the address of the first element in the array; the compiler simply computes it during the translation phase.
1. This is all taken from the paper The Development of the C Language
The term "lvalue of array type" literally refers to the array object as an lvalue of array type, i.e. array object as a whole. This lvalue is not modifiable as a whole, since there's no legal operation that can modify it as a whole. In fact, the only operations you can perform on an lvalue of array type are: unary & (address of), sizeof and implicit conversion to pointer type. None of these operations modify the array, which is why array objects are not modifiable.
a[i] does not work with lvalue of array type. a[i] designates an int object: the i-th element of array a. The semantics of this expression (if spelled out explicitly) is: *((int *) a + i). The very first step - (int *) a - already converts the lvalue of array type into an rvalue of type int *. At this point the lvalue of array type is out of the picture for good.
Incomplete type is a type whose size is not [yet] known. For example: a struct type that has been declared but not defined, an array type with unspecified size, the void type.
An incomplete type is a type which is declared but not defined, for example struct Foo;.
You can always assign to individual array elements (assuming they are not const). But you cannot assign something to the whole array.
C and C++ are quite confusing in that something like int a[10] = {0, 1, 2, 3}; is not an assignment but an initialization even though it looks pretty much like an assignment.
This is OK (initialization):
int a[10] = {0, 1, 2, 3};
This is does not work in C/C++:
int a[10];
a = {0, 1, 2, 3};
Assuming a is an array of ints, a[10] isn't an array. It is an int.
a = {0} would be illegal.
Remember that the value of an array is actually the address (pointer) of its first element. This address can't be modified. So
int a[10], b[10];
a = b
is illegal.
It has of course nothing to do with modifying the content of the array as in a[1] = 3
I thought I really understood this, and re-reading the standard (ISO 9899:1990) just confirms my obviously wrong understanding, so now I ask here.
The following program crashes:
#include <stdio.h>
#include <stddef.h>
typedef struct {
int array[3];
} type1_t;
typedef struct {
int *ptr;
} type2_t;
type1_t my_test = { {1, 2, 3} };
int main(int argc, char *argv[])
{
(void)argc;
(void)argv;
type1_t *type1_p = &my_test;
type2_t *type2_p = (type2_t *) &my_test;
printf("offsetof(type1_t, array) = %lu\n", offsetof(type1_t, array)); // 0
printf("my_test.array[0] = %d\n", my_test.array[0]);
printf("type1_p->array[0] = %d\n", type1_p->array[0]);
printf("type2_p->ptr[0] = %d\n", type2_p->ptr[0]); // this line crashes
return 0;
}
Comparing the expressions my_test.array[0] and type2_p->ptr[0] according to my interpretation of the standard:
6.3.2.1 Array subscripting
"The definition of the subscript
operator [] is that E1[E2] is
identical to (*((E1)+(E2)))."
Applying this gives:
my_test.array[0]
(*((E1)+(E2)))
(*((my_test.array)+(0)))
(*(my_test.array+0))
(*(my_test.array))
(*my_test.array)
*my_test.array
type2_p->ptr[0]
*((E1)+(E2)))
(*((type2_p->ptr)+(0)))
(*(type2_p->ptr+0))
(*(type2_p->ptr))
(*type2_p->ptr)
*type2_p->ptr
type2_p->ptr has type "pointer to int" and the value is the start address of my_test. *type2_p->ptr therefore evaluates to an integer object whose storage is at the same address that my_test has.
Further:
6.2.2.1 Lvalues, arrays, and function designators
"Except when it is the operand of the
sizeof operator or the unary &
operator, ... , an lvalue that has
type array of type is converted to
an expression with type pointer to
type that points to the initial
element of the array object and is not
an lvalue."
my_test.array has type "array of int" and is as described above converted to "pointer to int" with the address of the first element as value. *my_test.array therefore evaluates to an integer object whose storage is at the same address that the first element in the array.
And finally
6.5.2.1 Structure and union specifiers
A pointer to a structure object,
suitably converted, points to its
initial member ..., and vice versa.
There may be unnamed padding within a
structure object, but not at its
beginning, as necessary to achieve the
appropriate alignment.
Since the first member of type1_t is the array, the start address of
that and the whole type1_t object is the same as described above.
My understanding were therefore that *type2_p->ptr evaluates to
an integer whose storage is at the same address that the first
element in the array and thus is identical to *my_test.array.
But this cannot be the case, because the program crashes consistently
on solaris, cygwin and linux with gcc versions 2.95.3, 3.4.4
and 4.3.2, so any environmental issue is completely out of the question.
Where is my reasoning wrong/what do I not understand?
How do I declare type2_t to make ptr point to the first member of the array?
Please forgive me if i overlook anything in your analysis. But i think the fundamental bug in all that is this wrong assumption
type2_p->ptr has type "pointer to int" and the value is the start address of my_test.
There is nothing that makes it have that value. Rather, it is very probably that it points somewhere to
0x00000001
Because what you do is to interpret the bytes making up that integer array as a pointer. Then you add something to it and subscript.
Also, i highly doubt your casting to the other struct is actually valid (as in, guaranteed to work). You may cast and then read a common initial sequence of either struct if both of them are members of an union. But they are not in your example. You also may cast to a pointer to the first member. For example:
typedef struct {
int array[3];
} type1_t;
type1_t f = { { 1, 2, 3 } };
int main(void) {
int (*arrayp)[3] = (int(*)[3])&f;
(*arrayp)[0] = 3;
assert(f.array[0] == 3);
return 0;
}
An array is a kind of storage. Syntactically, it's used as a pointer, but physically, there's no "pointer" variable in that struct — just the three ints. On the other hand, the int pointer is an actual datatype stored in the struct. Therefore, when you perform the cast, you are probably* making ptr take on the value of the first element in the array, namely 1.
*I'm not sure this is actually defined behavior, but that's how it will work on most common systems at least.
Where is my reasoning wrong/what do I not understand?
type_1::array (not strictly C syntax) is not an int *; it is an int [3].
How do I declare type2_t to make ptr point to the first member of the array?
typedef struct
{
int ptr[];
} type2_t;
That declares a flexible array member. From the C Standard (6.7.2.1 paragraph 16):
However, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array.
I.e., it can alias type1_t::array properly.
It's got to be defined behaviour. Think about it in terms of memory.
For simplicity, assume my_test is at address 0x80000000.
type1_p == 0x80000000
&type1_p->my_array[0] == 0x80000000 // my_array[0] == 1
&type1_p->my_array[1] == 0x80000004 // my_array[1] == 2
&type1_p->my_array[2] == 0x80000008 // my_array[2] == 3
When you cast it to type2_t,
type2_p == 0x80000000
&type2_p->ptr == 0x8000000 // type2_p->ptr == 1
type2_p->ptr[0] == *(type2_p->ptr) == *1
To do what you want, you would have to either create a secondary structure & assign the address of the array to ptr (e.g. type2_p->ptr = type1_p->my_array) or declare ptr as an array (or a variable length array, e.g. int ptr[]).
Alternatively, you could access the elements in an ugly manner : (&type2_p->ptr)[0], (&type2_p->ptr)[1]. However, be careful here since (&type2_p->ptr)[0] will actually be an int*, not an int. On 64-bit platforms, for instance, (&type2_p->ptr)[0] will actually be 0x100000002 (4294967298).