Pointer vs array in C, non-trivial difference - c

I thought I really understood this, and re-reading the standard (ISO 9899:1990) just confirms my obviously wrong understanding, so now I ask here.
The following program crashes:
#include <stdio.h>
#include <stddef.h>
typedef struct {
int array[3];
} type1_t;
typedef struct {
int *ptr;
} type2_t;
type1_t my_test = { {1, 2, 3} };
int main(int argc, char *argv[])
{
(void)argc;
(void)argv;
type1_t *type1_p = &my_test;
type2_t *type2_p = (type2_t *) &my_test;
printf("offsetof(type1_t, array) = %lu\n", offsetof(type1_t, array)); // 0
printf("my_test.array[0] = %d\n", my_test.array[0]);
printf("type1_p->array[0] = %d\n", type1_p->array[0]);
printf("type2_p->ptr[0] = %d\n", type2_p->ptr[0]); // this line crashes
return 0;
}
Comparing the expressions my_test.array[0] and type2_p->ptr[0] according to my interpretation of the standard:
6.3.2.1 Array subscripting
"The definition of the subscript
operator [] is that E1[E2] is
identical to (*((E1)+(E2)))."
Applying this gives:
my_test.array[0]
(*((E1)+(E2)))
(*((my_test.array)+(0)))
(*(my_test.array+0))
(*(my_test.array))
(*my_test.array)
*my_test.array
type2_p->ptr[0]
*((E1)+(E2)))
(*((type2_p->ptr)+(0)))
(*(type2_p->ptr+0))
(*(type2_p->ptr))
(*type2_p->ptr)
*type2_p->ptr
type2_p->ptr has type "pointer to int" and the value is the start address of my_test. *type2_p->ptr therefore evaluates to an integer object whose storage is at the same address that my_test has.
Further:
6.2.2.1 Lvalues, arrays, and function designators
"Except when it is the operand of the
sizeof operator or the unary &
operator, ... , an lvalue that has
type array of type is converted to
an expression with type pointer to
type that points to the initial
element of the array object and is not
an lvalue."
my_test.array has type "array of int" and is as described above converted to "pointer to int" with the address of the first element as value. *my_test.array therefore evaluates to an integer object whose storage is at the same address that the first element in the array.
And finally
6.5.2.1 Structure and union specifiers
A pointer to a structure object,
suitably converted, points to its
initial member ..., and vice versa.
There may be unnamed padding within a
structure object, but not at its
beginning, as necessary to achieve the
appropriate alignment.
Since the first member of type1_t is the array, the start address of
that and the whole type1_t object is the same as described above.
My understanding were therefore that *type2_p->ptr evaluates to
an integer whose storage is at the same address that the first
element in the array and thus is identical to *my_test.array.
But this cannot be the case, because the program crashes consistently
on solaris, cygwin and linux with gcc versions 2.95.3, 3.4.4
and 4.3.2, so any environmental issue is completely out of the question.
Where is my reasoning wrong/what do I not understand?
How do I declare type2_t to make ptr point to the first member of the array?

Please forgive me if i overlook anything in your analysis. But i think the fundamental bug in all that is this wrong assumption
type2_p->ptr has type "pointer to int" and the value is the start address of my_test.
There is nothing that makes it have that value. Rather, it is very probably that it points somewhere to
0x00000001
Because what you do is to interpret the bytes making up that integer array as a pointer. Then you add something to it and subscript.
Also, i highly doubt your casting to the other struct is actually valid (as in, guaranteed to work). You may cast and then read a common initial sequence of either struct if both of them are members of an union. But they are not in your example. You also may cast to a pointer to the first member. For example:
typedef struct {
int array[3];
} type1_t;
type1_t f = { { 1, 2, 3 } };
int main(void) {
int (*arrayp)[3] = (int(*)[3])&f;
(*arrayp)[0] = 3;
assert(f.array[0] == 3);
return 0;
}

An array is a kind of storage. Syntactically, it's used as a pointer, but physically, there's no "pointer" variable in that struct — just the three ints. On the other hand, the int pointer is an actual datatype stored in the struct. Therefore, when you perform the cast, you are probably* making ptr take on the value of the first element in the array, namely 1.
*I'm not sure this is actually defined behavior, but that's how it will work on most common systems at least.

Where is my reasoning wrong/what do I not understand?
type_1::array (not strictly C syntax) is not an int *; it is an int [3].
How do I declare type2_t to make ptr point to the first member of the array?
typedef struct
{
int ptr[];
} type2_t;
That declares a flexible array member. From the C Standard (6.7.2.1 paragraph 16):
However, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array.
I.e., it can alias type1_t::array properly.

It's got to be defined behaviour. Think about it in terms of memory.
For simplicity, assume my_test is at address 0x80000000.
type1_p == 0x80000000
&type1_p->my_array[0] == 0x80000000 // my_array[0] == 1
&type1_p->my_array[1] == 0x80000004 // my_array[1] == 2
&type1_p->my_array[2] == 0x80000008 // my_array[2] == 3
When you cast it to type2_t,
type2_p == 0x80000000
&type2_p->ptr == 0x8000000 // type2_p->ptr == 1
type2_p->ptr[0] == *(type2_p->ptr) == *1
To do what you want, you would have to either create a secondary structure & assign the address of the array to ptr (e.g. type2_p->ptr = type1_p->my_array) or declare ptr as an array (or a variable length array, e.g. int ptr[]).
Alternatively, you could access the elements in an ugly manner : (&type2_p->ptr)[0], (&type2_p->ptr)[1]. However, be careful here since (&type2_p->ptr)[0] will actually be an int*, not an int. On 64-bit platforms, for instance, (&type2_p->ptr)[0] will actually be 0x100000002 (4294967298).

Related

Pick one string from an Array of 4 strings in C [duplicate]

It is stated here that
The term modifiable lvalue is used to emphasize that the lvalue allows the designated object to be changed as well as examined. The following object types are lvalues, but not modifiable lvalues:
An array type
An incomplete type
A const-qualified type
A structure or union type with one of its members qualified as a const type
Because these lvalues are not modifiable, they cannot appear on the left side of an assignment statement.
Why array type object is not modifiable? Isn't it correct to write
int i = 5, a[10] = {0};
a[i] = 1;
?
And also, what is an incomplete type?
Assume the declaration
int a[10];
then all of the following are true:
the type of the expression a is "10-element array of int"; except when a is the operand of the sizeof or unary & operators, the expression will be converted to an expression of type "pointer to int" and its value will be the address of the first element in the array;
the type of the expression a[i] is int; it refers to the integer object stored as the i'th element of the array;
The expression a may not be the target of an assignment because C does not treat arrays like other variables, so you cannot write something like a = b or a = malloc(n * sizeof *a) or anything like that.
You'll notice I keep emphasizing the word "expression". There's a difference between the chunk of memory we set aside to hold 10 integers and the symbols (expressions) we use to refer to that chunk of memory. We can refer to it with the expression a. We can also create a pointer to that array:
int (*ptr)[10] = &a;
The expression *ptr also has type "10-element array of int", and it refers to the same chunk of memory that a refers to.
C does not treat array expressions (a, *ptr) like expressions of other types, and one of the differences is that an expression of array type may not be the target of an assignment. You cannot reassign a to refer to a different array object (same for the expression *ptr). You may assign a new value to a[i] or (*ptr)[i] (change the value of each array element), and you may assign ptr to point to a different array:
int b[10], c[10];
.....
ptr = &b;
.....
ptr = &c;
As for the second question...
An incomplete type lacks size information; declarations like
struct foo;
int bar[];
union bletch;
all create incomplete types because there isn't enough information for the compiler to determine how much storage to set aside for an object of that type. You cannot create objects of incomplete type; for example, you cannot declare
struct foo myFoo;
unless you complete the definition for struct foo. However, you can create pointers to incomplete types; for example, you could declare
struct foo *myFooPtr;
without completing the definition for struct foo because a pointer just stores the address of the object, and you don't need to know the type's size for that. This makes it possible to define self-referential types like
struct node {
T key; // for any type T
Q val; // for any type Q
struct node *left;
struct node *right;
};
The type definition for struct node isn't complete until we hit that closing }. Since we can declare a pointer to an incomplete type, we're okay. However, we could not define the struct as
struct node {
... // same as above
struct node left;
struct node right;
};
because the type isn't complete when we declare the left and right members, and also because each left and right member would each contain left and right members of their own, each of which would contain left and right members of their own, and on and on and on.
That's great for structs and unions, but what about
int bar[];
???
We've declared the symbol bar and indicated that it will be an array type, but the size is unknown at this point. Eventually we'll have to define it with a size, but this way the symbol can be used in contexts where the array size isn't meaningful or necessary. Don't have a good, non-contrived example off the top of my head to illustrate this, though.
EDIT
Responding to the comments here, since there isn't going to be room in the comments section for what I want to write (I'm in a verbose mood this evening). You asked:
Does it mean every variables are expression?
It means that any variable can be an expression, or part of an expression. Here's how the language standard defines the term expression:
6.5 Expressions
1 An expression is a sequence of operators and operands that specifies computation of a
value, or that designates an object or a function, or that generates side effects, or that
performs a combination thereof.
For example, the variable a all by itself counts as an expression; it designates the array object we defined to hold 10 integer values. It also evaluates to the address of the first element of the array. The variable a can also be part of a larger expression like a[i]; the operator is the subscript operator [] and the operands are the variables a and i. This expression designates a single member of the array, and it evaluates to the value currectly stored in that member. That expression in turn can be part of a larger expression like a[i] = 0.
And also let me clear that, in the declaration int a[10], does a[] stands for array type
Yes, exactly.
In C, declarations are based on the types of expressions, rather than the types of objects. If you have a simple variable named y that stores an int value, and you want to access that value, you simply use y in an expression, like
x = y;
The type of the expression y is int, so the declaration of y is written
int y;
If, on the other hand, you have an array of int values, and you want to access a specific element, you would use the array name and an index along with the subscript operator to access that value, like
x = a[i];
The type of the expression a[i] is int, so the declaration of the array is written as
int arr[N]; // for some value N.
The "int-ness" of arr is given by the type specifier int; the "array-ness" of arr is given by the declarator arr[N]. The declarator gives us the name of the object being declared (arr) along with some additional type information not given by the type specifier ("is an N-element array"). The declaration "reads" as
a -- a
a[N] -- is an N-element array
int a[N]; -- of int
EDIT2
And after all that, I still haven't told you the real reason why array expressions are non-modifiable lvalues. So here's yet another chapter to this book of an answer.
C didn't spring fully formed from the mind of Dennis Ritchie; it was derived from an earlier language known as B (which was derived from BCPL).1 B was a "typeless" language; it didn't have different types for integers, floats, text, records, etc. Instead, everything was simply a fixed length word or "cell" (essentially an unsigned integer). Memory was treated as a linear array of cells. When you allocated an array in B, such as
auto V[10];
the compiler allocated 11 cells; 10 contiguous cells for the array itself, plus a cell that was bound to V containing the location of the first cell:
+----+
V: | | -----+
+----+ |
... |
+----+ |
| | <----+
+----+
| |
+----+
| |
+----+
| |
+----+
...
When Ritchie was adding struct types to C, he realized that this arrangement was causing him some problems. For example, he wanted to create a struct type to represent an entry in a file or directory table:
struct {
int inumber;
char name[14];
};
He wanted the structure to not just describe the entry in an abstract manner, but also to represent the bits in the actual file table entry, which didn't have an extra cell or word to store the location of the first element in the array. So he got rid of it - instead of setting aside a separate location to store the address of the first element, he wrote C such that the address of the first element would be computed when the array expression was evaluated.
This is why you can't do something like
int a[N], b[N];
a = b;
because both a and b evaluate to pointer values in that context; it's equivalent to writing 3 = 4. There's nothing in memory that actually stores the address of the first element in the array; the compiler simply computes it during the translation phase.
1. This is all taken from the paper The Development of the C Language
The term "lvalue of array type" literally refers to the array object as an lvalue of array type, i.e. array object as a whole. This lvalue is not modifiable as a whole, since there's no legal operation that can modify it as a whole. In fact, the only operations you can perform on an lvalue of array type are: unary & (address of), sizeof and implicit conversion to pointer type. None of these operations modify the array, which is why array objects are not modifiable.
a[i] does not work with lvalue of array type. a[i] designates an int object: the i-th element of array a. The semantics of this expression (if spelled out explicitly) is: *((int *) a + i). The very first step - (int *) a - already converts the lvalue of array type into an rvalue of type int *. At this point the lvalue of array type is out of the picture for good.
Incomplete type is a type whose size is not [yet] known. For example: a struct type that has been declared but not defined, an array type with unspecified size, the void type.
An incomplete type is a type which is declared but not defined, for example struct Foo;.
You can always assign to individual array elements (assuming they are not const). But you cannot assign something to the whole array.
C and C++ are quite confusing in that something like int a[10] = {0, 1, 2, 3}; is not an assignment but an initialization even though it looks pretty much like an assignment.
This is OK (initialization):
int a[10] = {0, 1, 2, 3};
This is does not work in C/C++:
int a[10];
a = {0, 1, 2, 3};
Assuming a is an array of ints, a[10] isn't an array. It is an int.
a = {0} would be illegal.
Remember that the value of an array is actually the address (pointer) of its first element. This address can't be modified. So
int a[10], b[10];
a = b
is illegal.
It has of course nothing to do with modifying the content of the array as in a[1] = 3

Does a C struct hold its members in a contiguous block of memory? [duplicate]

This question already has answers here:
Struct memory layout in C
(3 answers)
Closed 3 years ago.
Let's say my code is:
typedef stuct {
int x;
double y;
char z;
} Foo;
would x, y, and z, be right next to each other in memory? Could pointer arithmetic 'iterate' over them?
My C is rusty so I can not quite get the program right to test this.
Here is my code in full.
#include <stdlib.h>
#include <stdio.h>
typedef struct {
int x;
double y;
char z;
} Foo;
int main() {
Foo *f = malloc(sizeof(Foo));
f->x = 10;
f->y = 30.0;
f->z = 'c';
// Pointer to iterate.
for(int i = 0; i == sizeof(Foo); i++) {
if (i == 0) {
printf(*(f + i));
}
else if (i == (sizeof(int) + 1)) {
printf(*(f + i));
}
else if (i ==(sizeof(int) + sizeof(double) + 1)) {
printf(*(f + i));
}
else {
continue;
}
return 0;
}
No, it is not guaranteed for struct members to be contiguous in memory.
From §6.7.2.1 point 15 in the C standard (page 115 here):
There may be unnamed padding within a structure object, but not at its beginning.
Most of the times, something like:
struct mystruct {
int a;
char b;
int c;
};
Is indeed aligned to sizeof(int), like this:
0 1 2 3 4 5 6 7 8 9 10 11
[a ][b][padding][c ]
Yes and no.
Yes, the members of a struct are allocated within a contiguous block of memory. In your example, an object of type Foo occupies sizeof (Foo) contiguous bytes of memory, and all the members are within that sequence of bytes.
But no, there is no guarantee that the members themselves are adjacent to each other. There can be padding bytes between any two members, or after the last one. The standard does guarantee that the first defined member is at offset 0, and that all the members are allocated in the order in which they're defined (which means you can sometimes save space by reordering the members).
Normally compilers use just enough padding to satisfy the alignment requirements of the member types, but the standard doesn't require that.
So you can't (directly) iterate over the members of a structure. If you want to do that, and if all the members are of the same type, use an array.
You can use the offsetof macro, defined in <stddef.h>, to determine the byte offset of (non-bitfield) member, and it can sometimes be useful to use that to build a data structure that can be used to iterate over the members of a structure. But it's tedious, and rarely more useful than simply referring to the members by name -- particularly if they have different types.
would x, y, and z, be right next to each other in memory?
No. The struct memory allocation layout is implementation dependent - there is no guarantee struct members are right next to each other. One reason is memory padding, which is
Could pointer arithmetic 'iterate' over them?
No. You can only do pointer arithmetic for pointers to the same type.
would x, y, and z, be right next to each other in memory?
They could be, but don't have to be. The placement of elements in structures is not mandated by the ISO C standard.
In general, compiler will place the elements at some offset that is "optimal" for the architecture it compiles to. So, on 32-bit CPUs, most compilers will, by default, place elements at offsets that are multiples of 4 (as that will make for most efficient access). But, most compilers also have ways to specify different placement (alignment).
So, if you have something like:
struct X {
uint8_t a;
uint32_t b;
};
Then offset of a would be 0, but offset of b would be 4 on most 32-bit compilers with default options.
Could pointer arithmetic 'iterate' over them?
Not like the code in you example. Pointer arithmetic on pointers to structures is defined to add/subtract the address with the size of the structure. So, if you have:
struct X a[2];
struct X *p = a;
then p+1 == a+1.
To "iterate" over elements you would need to cast the p to uint8_t* and then add the offset of the element to it (using offsetof standard macro), element by element.
It depends on the padding decided on by the compiler (which is influenced by the requirements and advantages on the target architecture). The C standard does guarantee that there is to be no padding before the first member of a struct, but after that, you cannot assume anything. However, if the sizeof the struct does equal the sizeof each of its constituent types, then there is no padding.
You can enforce no padding with a compiler-specific directive. On MSVC, that's:
#pragma pack(push, 1)
// your struct...
#pragma pack(pop)
GCC has __attribute__((packed)) for the equivalent effect.
There are multiple issues with trying to use pointer arithmetic in this matter.
The first issue, as has been mentioned in other answers, is that there could be padding throughout the struct throwing off your calculations.
C11 working draft 6.7.2.1 p15: (bold emphasis mine)
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared. A pointer to a
structure object, suitably converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.
The second issue is that pointer arithmetic is done in multiples of the size of the type being pointed to. In the case of a struct, if you add 1 to a pointer to a struct, the pointer will be pointing to an object after the struct. Using your example struct Foo:
Foo x[3];
Foo *y = x+1; // y points to the second Foo (x[1]), not the second byte of x[0]
6.5.6 p8:
When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist.
A third issue is that performing pointer-arithmetic such that the result points more than one past the end of the object causes undefined behavior, as does dereferencing a pointer to one element past the end of the object obtained through the pointer arithmetic. So even if you had a struct containing three ints with no padding inbetween and took a pointer to the first int and incremented it to point to the second int, dereferencing it would cause undefined behavior.
More from 6.5.6: (bold-italic emphasis mine)
Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
A fourth issue is that dereferencing a pointer to one type as another type results in undefined behavior. This attempt at type-punning is often referred to as a strict-aliasing violation. The following is an example of undefined behavior through strict-aliasing violation even though the data types are the same size (assuming 4-byte int and float) and nicely aligned:
int x = 1;
float y = *(float *)&x;
6.5 p7:
An object shall have its stored value accessed only by an lvalue expression that has one of
the following types:
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type
of the object,
a type that is the signed or unsigned type corresponding
to the effective type of the object,
a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
an aggregate or union type that includes one of the
aforementioned types among its members (including, recursively, a
member of a subaggregate or contained union), or
a character type.
Summary:
No, a C struct does not necessarily hold its members in contiguous memory, and even if it did, the pointer arithmetic you still couldn't do what you want to do with pointer arithemetic.

C - Assigning pointers to arrays? Based on four examples

Assume following Code:
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char **argv)
{
int arrayXYZ[10];
int i;
int *pi;
int intVar;
for (i=0; i<10; i++){
arrayXYZ[i] = i;
}
pi = arrayXYZ; // Reference 1
pi++; // Reference 2
arrayXYZ++; // Reference 3
arrayXYZ = pi; // Reference 4
}
Reference 1 is correct: pi points to first element in arrayXYZ -> *pi = 0
Reference 2 is correct: element to which pi points is incremented -> *pi = 1
Reference 3 is not correct: I am not completely sure why. Every integer needs 4 bits of memory. Hence, we cannot increment the address of the head of the array by just one? Assume, we had a char array with sizeof(char)=1 -> Would the head of the array point to the next bucket?
Reference 4 is not correct: I am not completely sure why. Why cannot the head of the array point to the address to which pi points?
Thanks for all clarifications!
I am a new member, so if my question doesn't follow the Stackoverflow guidelines, feel free to tell me how I can improve my next questions!
arrayXYZ++;
This is equivalent to:
arrayXYZ += 1;
which is equivalent to:
arrayXYZ = arrayXYZ + 1;
This is not allowed because the C language does not allow it. An array can not be assigned to.
arrayXYZ = pi;
This fails for the same reason. An array can not be assigned to.
The other assignments work because you are allowed to assign to a pointer.
Also keep in mind that arrays and pointers are distinct datatypes. In C, there are circumstances where arrays decay into a pointer to their first element for convenience purposes. Which is why this works:
pi = arrayXYZ;
However, this is just an automatic conversion, so that you don't have to write:
pi = &arrayXYZ[0];
This automatic conversion does not mean that arrays are the same thing as pointers.
From C11 standard §6.3.2.1 (N1570)
An lvalue is an expression (with an object type other than void) that potentially designates an object;64) if an lvalue does not designate an object when it is evaluated, the behavior is undefined. When an object is said to have a particular type, the type is specified by the lvalue used to designate the object. A modifiable lvalue is an lvalue that does not have array type, does not have an incomplete type, does not have a const- qualified type, and if it is a structure or union, does not have any member (including, recursively, any member or element of all contained aggregates or unions) with a const- qualified type.
And also From §6.5.2.4
The operand of the postfix increment or decrement operator shall have atomic, qualified, or unqualified real or pointer type, and shall be a modifiable lvalue.
As pointed out here these are the reasons why those statements are illegal. Same way for assignment operation the left one has to be modifiable. Here it is not. That's why the problem.
Now to explain why the other two works - there is a thing called array decay. Array in most situations (exceptions are when used in operand of &, sizeof etc) are converted to pointer to the first element of the array and that pointer is being assigned to the pi. This is modifiable. And that's why you can easily apply ++ over it.

Why can't I retrieve my flexible array member size?

OK so I was reading the standard paper (ISO C11) in the part where it explains flexible array members (at 6.7.2.1 p18). It says this:
As a special case, the last element of a structure with more than one
named member may have an incomplete array type; this is called a
flexible array member. In most situations, the flexible array member
is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more
trailing padding than the omission would imply. However, when a . (or
->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member,
it behaves as if that member were replaced with the longest array
(with the same element type) that would not make the structure larger
than the object being accessed; the offset of the array shall remain
that of the flexible array member, even if this would differ from that
of the replacement array. If this array would have no elements, it
behaves as if it had one element but the behavior is undefined if any
attempt is made to access that element or to generate a pointer one
past it.
And here are some of the examples given below (p20):
EXAMPLE 2 After the declaration:
struct s { int n; double d[]; };
the structure struct s has a flexible array member d. A typical way to
use this is:
int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));
and assuming that the call to malloc succeeds, the object pointed to
by p behaves, for most purposes, as if p had been declared as:
struct { int n; double d[m]; } *p;
(there are circumstances in which this equivalence is broken; in
particular, the offsets of member d might not be the same).
Added spoilers as examples inside the standard are not documentation.
And now my example (extending the one from the standard):
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
struct s { int n; double d[]; };
int m = 7;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m])); //create our object
printf("%zu", sizeof(p->d)); //retrieve the size of the flexible array member
free(p); //free out object
}
Online example.
Now the compiler is complaining that p->d has incomplete type double[] which is clearly not the case according the standard paper. Is this a bug in the GCC compiler?
As a special case, the last element of a structure with more than one named member may have an incomplete array type; ... C11dr 6.7.2.1 18
In the following d is an incomplete type.
struct s { int n; double d[]; };
The sizeof operator shall not be applied to an expression that has function type or an incomplete type ... C11dr §6.5.3.4 1
// This does not change the type of field `m`.
// It (that is `d`) behaves like a `double d[m]`, but it is still an incomplete type.
struct s *p = foo();
// UB
printf("%zu", sizeof(p->d));
This looks like a defect in the Standard. We can see from the paper where flexible array members were standardized, N791 "Solving the struct hack problem", that the struct definition replacement is intended to apply only in evaluated context (to borrow the C++ terminology); my emphasis:
When an lvalue whose type is a structure
with a flexible array member is used to access an object, it behaves as
if that member were replaced by the longest array that would not make
the structure larger than the object being accessed.
Compare the eventual standard language:
[W]hen a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same
element type) that would not make the structure larger than the object being accessed [...]
Some form of language like "When a . (or ->) operator whose left operand is (a pointer to) a structure with a flexible array member and whose right operand names that member is evaluated [...]" would seem to work to fix it.
(Note that sizeof does not evaluate its argument, except for variable length arrays, which are another kettle of fish.)
There is no corresponding defect report visible via the JTC1/SC22/WG14 website. You might consider submitting a defect report via your ISO national member body, or asking your vendor to do so.
Standard says:
C11-§6.5.3.4/2
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand.
and it also says
C11-§6.5.3.4/1
The sizeof operator shall not be applied to an expression that has function type or an incomplete type, [...]
p->d is of incomplete type and it can't be an operand of sizeof operator. The statement
it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed
doesn't hold for sizeof operator as it determine size of the object by the type of object which must be a complete type.
First, what is happening is correct in terms of the standard, arrays that are declared [] are incomplete and you can't use the sizeof operator.
But there is also a simple reason for it in your case. You never told your compiler that in that particular case the d member should be viewed as of a particular size. You only told malloc the total memory size to be reserved and placed p to point to that. The compiler has obtained no type information that could help him deduce the size of the array.
This is different from allocating a variable length array (VLA) or a pointer to VLA:
double (*q)[m] = malloc(sizeof(double[m]));
Here the compiler can know what type of array q is pointing to. But not because you told malloc the total size (that information is not returned from the malloc call) but because m is part of the type specification of q.
The C Standard is a bit loosey-goosey when it comes to the definition of certain terms in certain contexts. Given something like:
struct foo {uint32_t x; uint16_t y[]; };
char *p = 1024+(char*)malloc(1024); // Point to end of region
struct foo *q1 = (struct foo *)(p -= 512); // Allocate some space from it
... some code which uses *q1
struct foo *q2 = (struct foo *)(p -= 512); // Allocate more space from it
there's no really clear indication of what storage is occupied by objects
*q1 or *q2, nor by q1->y or q2->y. If *q1 will never be accessed afterward,
then q2->y may be treated as a uint16_t[509], but writing to *q1 will trash
the contents of q2->y[254] and above, and writing q2->y[254] and above will
trash *q1. Since a compiler will generally have no way of knowing what will
happen to *q1 in the future, it will have no way of sensibly reporting a size
for q2->y.

Why array type object is not modifiable?

It is stated here that
The term modifiable lvalue is used to emphasize that the lvalue allows the designated object to be changed as well as examined. The following object types are lvalues, but not modifiable lvalues:
An array type
An incomplete type
A const-qualified type
A structure or union type with one of its members qualified as a const type
Because these lvalues are not modifiable, they cannot appear on the left side of an assignment statement.
Why array type object is not modifiable? Isn't it correct to write
int i = 5, a[10] = {0};
a[i] = 1;
?
And also, what is an incomplete type?
Assume the declaration
int a[10];
then all of the following are true:
the type of the expression a is "10-element array of int"; except when a is the operand of the sizeof or unary & operators, the expression will be converted to an expression of type "pointer to int" and its value will be the address of the first element in the array;
the type of the expression a[i] is int; it refers to the integer object stored as the i'th element of the array;
The expression a may not be the target of an assignment because C does not treat arrays like other variables, so you cannot write something like a = b or a = malloc(n * sizeof *a) or anything like that.
You'll notice I keep emphasizing the word "expression". There's a difference between the chunk of memory we set aside to hold 10 integers and the symbols (expressions) we use to refer to that chunk of memory. We can refer to it with the expression a. We can also create a pointer to that array:
int (*ptr)[10] = &a;
The expression *ptr also has type "10-element array of int", and it refers to the same chunk of memory that a refers to.
C does not treat array expressions (a, *ptr) like expressions of other types, and one of the differences is that an expression of array type may not be the target of an assignment. You cannot reassign a to refer to a different array object (same for the expression *ptr). You may assign a new value to a[i] or (*ptr)[i] (change the value of each array element), and you may assign ptr to point to a different array:
int b[10], c[10];
.....
ptr = &b;
.....
ptr = &c;
As for the second question...
An incomplete type lacks size information; declarations like
struct foo;
int bar[];
union bletch;
all create incomplete types because there isn't enough information for the compiler to determine how much storage to set aside for an object of that type. You cannot create objects of incomplete type; for example, you cannot declare
struct foo myFoo;
unless you complete the definition for struct foo. However, you can create pointers to incomplete types; for example, you could declare
struct foo *myFooPtr;
without completing the definition for struct foo because a pointer just stores the address of the object, and you don't need to know the type's size for that. This makes it possible to define self-referential types like
struct node {
T key; // for any type T
Q val; // for any type Q
struct node *left;
struct node *right;
};
The type definition for struct node isn't complete until we hit that closing }. Since we can declare a pointer to an incomplete type, we're okay. However, we could not define the struct as
struct node {
... // same as above
struct node left;
struct node right;
};
because the type isn't complete when we declare the left and right members, and also because each left and right member would each contain left and right members of their own, each of which would contain left and right members of their own, and on and on and on.
That's great for structs and unions, but what about
int bar[];
???
We've declared the symbol bar and indicated that it will be an array type, but the size is unknown at this point. Eventually we'll have to define it with a size, but this way the symbol can be used in contexts where the array size isn't meaningful or necessary. Don't have a good, non-contrived example off the top of my head to illustrate this, though.
EDIT
Responding to the comments here, since there isn't going to be room in the comments section for what I want to write (I'm in a verbose mood this evening). You asked:
Does it mean every variables are expression?
It means that any variable can be an expression, or part of an expression. Here's how the language standard defines the term expression:
6.5 Expressions
1 An expression is a sequence of operators and operands that specifies computation of a
value, or that designates an object or a function, or that generates side effects, or that
performs a combination thereof.
For example, the variable a all by itself counts as an expression; it designates the array object we defined to hold 10 integer values. It also evaluates to the address of the first element of the array. The variable a can also be part of a larger expression like a[i]; the operator is the subscript operator [] and the operands are the variables a and i. This expression designates a single member of the array, and it evaluates to the value currectly stored in that member. That expression in turn can be part of a larger expression like a[i] = 0.
And also let me clear that, in the declaration int a[10], does a[] stands for array type
Yes, exactly.
In C, declarations are based on the types of expressions, rather than the types of objects. If you have a simple variable named y that stores an int value, and you want to access that value, you simply use y in an expression, like
x = y;
The type of the expression y is int, so the declaration of y is written
int y;
If, on the other hand, you have an array of int values, and you want to access a specific element, you would use the array name and an index along with the subscript operator to access that value, like
x = a[i];
The type of the expression a[i] is int, so the declaration of the array is written as
int arr[N]; // for some value N.
The "int-ness" of arr is given by the type specifier int; the "array-ness" of arr is given by the declarator arr[N]. The declarator gives us the name of the object being declared (arr) along with some additional type information not given by the type specifier ("is an N-element array"). The declaration "reads" as
a -- a
a[N] -- is an N-element array
int a[N]; -- of int
EDIT2
And after all that, I still haven't told you the real reason why array expressions are non-modifiable lvalues. So here's yet another chapter to this book of an answer.
C didn't spring fully formed from the mind of Dennis Ritchie; it was derived from an earlier language known as B (which was derived from BCPL).1 B was a "typeless" language; it didn't have different types for integers, floats, text, records, etc. Instead, everything was simply a fixed length word or "cell" (essentially an unsigned integer). Memory was treated as a linear array of cells. When you allocated an array in B, such as
auto V[10];
the compiler allocated 11 cells; 10 contiguous cells for the array itself, plus a cell that was bound to V containing the location of the first cell:
+----+
V: | | -----+
+----+ |
... |
+----+ |
| | <----+
+----+
| |
+----+
| |
+----+
| |
+----+
...
When Ritchie was adding struct types to C, he realized that this arrangement was causing him some problems. For example, he wanted to create a struct type to represent an entry in a file or directory table:
struct {
int inumber;
char name[14];
};
He wanted the structure to not just describe the entry in an abstract manner, but also to represent the bits in the actual file table entry, which didn't have an extra cell or word to store the location of the first element in the array. So he got rid of it - instead of setting aside a separate location to store the address of the first element, he wrote C such that the address of the first element would be computed when the array expression was evaluated.
This is why you can't do something like
int a[N], b[N];
a = b;
because both a and b evaluate to pointer values in that context; it's equivalent to writing 3 = 4. There's nothing in memory that actually stores the address of the first element in the array; the compiler simply computes it during the translation phase.
1. This is all taken from the paper The Development of the C Language
The term "lvalue of array type" literally refers to the array object as an lvalue of array type, i.e. array object as a whole. This lvalue is not modifiable as a whole, since there's no legal operation that can modify it as a whole. In fact, the only operations you can perform on an lvalue of array type are: unary & (address of), sizeof and implicit conversion to pointer type. None of these operations modify the array, which is why array objects are not modifiable.
a[i] does not work with lvalue of array type. a[i] designates an int object: the i-th element of array a. The semantics of this expression (if spelled out explicitly) is: *((int *) a + i). The very first step - (int *) a - already converts the lvalue of array type into an rvalue of type int *. At this point the lvalue of array type is out of the picture for good.
Incomplete type is a type whose size is not [yet] known. For example: a struct type that has been declared but not defined, an array type with unspecified size, the void type.
An incomplete type is a type which is declared but not defined, for example struct Foo;.
You can always assign to individual array elements (assuming they are not const). But you cannot assign something to the whole array.
C and C++ are quite confusing in that something like int a[10] = {0, 1, 2, 3}; is not an assignment but an initialization even though it looks pretty much like an assignment.
This is OK (initialization):
int a[10] = {0, 1, 2, 3};
This is does not work in C/C++:
int a[10];
a = {0, 1, 2, 3};
Assuming a is an array of ints, a[10] isn't an array. It is an int.
a = {0} would be illegal.
Remember that the value of an array is actually the address (pointer) of its first element. This address can't be modified. So
int a[10], b[10];
a = b
is illegal.
It has of course nothing to do with modifying the content of the array as in a[1] = 3

Resources