I have the following example
#include <stdlib.h>
#include <stdio.h>
#include <stddef.h>
typedef struct test{
int a;
long b;
int c;
} test;
int main()
{
test *t = (test*) malloc(offsetof(test, c));
t -> b = 100;
}
It works fine, but Im not sure about it. I think I have UB here. We have a pointer to an object of a structure type. But the object of the structure type is not really valid.
I went through the standard and could not find any definition of this behavior. The only section I could find close to this one is 6.5.3.2:
If an invalid value has been assigned to the pointer, the behavior of
the unary * operator is undefined
But this is not really relevant since the pointer returned by malloc is completely valid.
Is there a reference in the standard explaining such a behavior? I'm using C11 N1570.
From C2011, paragraph 6.2.6.1/4:
Values stored in non-bit-field objects of any other object type consist of n x CHAR_BIT bits, where n is the size of an object of that type, in bytes.
Therefore, since the allocated object in your code is smaller than the size of a struct test, it cannot contain a value of an object of that type.
Now consider your expression t -> b = 100. C2011, paragraph 6.5.2.3/4 defines the behavior of the -> operator:
A postfix expression followed by the -> operator and an identifier designates a member of a structure or union object. The value is that of the named member of the object to which the first expression points [...].
(Emphasis added.) We've established that your t does not (indeed, cannot) point to a struct test, however, so the best we can say about 6.5.2.3/4 is that it does not apply to your case. There being no other definition of the behavior of the -> operator, we are left with paragraph 4/2 (emphasis added):
If a ''shall'' or ''shall not'' requirement that appears outside of a constraint or runtime- constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ''undefined behavior'' or by the omission of any explicit definition of behavior.
So there you are. The behavior of your code is undefined.
since the pointer returned by malloc is completely valid.
No the pointer is not "completely valid". Not at all.
Why do you think the pointer is "completely valid"? You didn't allocate enough bytes to hold an entire struct test - the pointer is not "completely valid" as there isn't a valid struct test object for you to access.
There's no such thing as a partial object in C. That's why you can't find it in the C standard.
It works fine
No, it doesn't.
"I didn't observe it blowing up." is not the same as "It works fine."
Your code doesn't do anything observable. Per the as-if rule the compiler is free to elide the entire thing and just return zero from main().
Related
Specifically, is the following code, the line below the marker, OK?
struct S{
int a;
};
#include <stdlib.h>
int main(){
struct S *p;
p = malloc(sizeof(struct S) + 1000);
// This line:
*(&(p->a) + 1) = 0;
}
People have argued here, but no one has given a convincing explanation or reference.
Their arguments are on a slightly different base, yet essentially the same
typedef struct _pack{
int64_t c;
} pack;
int main(){
pack *p;
char str[9] = "aaaaaaaa"; // Input
size_t len = offsetof(pack, c) + (strlen(str) + 1);
p = malloc(len);
// This line, with similar intention:
strcpy((char*)&(p->c), str);
// ^^^^^^^
The intent at least since the standardization of C in 1989 has been that implementations are allowed to check array bounds for array accesses.
The member p->a is an object of type int. C11 6.5.6p7 says that
7 For the purposes of [additive operators] a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
Thus
&(p->a)
is a pointer to an int; but it is also as if it were a pointer to the first element of an array of length 1, with int as the object type.
Now 6.5.6p8 allows one to calculate &(p->a) + 1 which is a pointer to just past the end of the array, so there is no undefined behaviour. However, the dereference of such a pointer is invalid. From Appendix J.2 where it is spelt out, the behaviour is undefined when:
Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just beyond the array object and is used as the operand of a unary * operator that is evaluated (6.5.6).
In the expression above, there is only one array, the one (as if) with exactly 1 element. If &(p->a) + 1 is dereferenced, the array with length 1 is accessed out of bounds and undefined behaviour occurs, i.e.
behavior [...], for which [The C11] Standard imposes no requirements
With the note saying that:
Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
That the most common behaviour is ignoring the situation completely, i.e. behaving as if the pointer referenced the memory location just after, doesn't mean that other kind of behaviour wouldn't be acceptable from the standard's point of view - the standard allows every imaginable and unimaginable outcome.
There has been claims that the C11 standard text has been written vaguely, and the intention of the committee should be that this indeed be allowed, and previously it would have been alright. It is not true. Read the part from the committee response to [Defect Report #017 dated 10 Dec 1992 to C89].
Question 16
[...]
Response
For an array of arrays, the permitted pointer arithmetic in
subclause 6.3.6, page 47, lines 12-40 is to be understood by
interpreting the use of the word object as denoting the specific
object determined directly by the pointer's type and value, not other
objects related to that one by contiguity. Therefore, if an expression
exceeds these permissions, the behavior is undefined. For example, the
following code has undefined behavior:
int a[4][5];
a[1][7] = 0; /* undefined */
Some conforming implementations may
choose to diagnose an array bounds violation, while others may
choose to interpret such attempted accesses successfully with the
obvious extended semantics.
(bolded emphasis mine)
There is no reason why the same wouldn't be transferred to scalar members of structures, especially when 6.5.6p7 says that a pointer to them should be considered to behave the same as a pointer to the first element of an array of length one with the type of the object as its element type.
If you want to address the consecutive structs, you can always take the pointer to the first member and cast that as the pointer to the struct and advance that instead:
*(int *)((S *)&(p->a) + 1) = 0;
This is undefined behavior, as you are accessing something that is not an array (int a within struct S) as an array, and out of bounds at that.
The correct way to achieve what you want, is to use an array without a size as the last struct member:
#include <stdlib.h>
typedef struct S {
int foo; //avoid flexible array being the only member
int a[];
} S;
int main(){
S *p = malloc(sizeof(*p) + 2*sizeof(int));
p->a[0] = 0;
p->a[1] = 42; //Perfectly legal.
}
C standard guarantees that
§6.7.2.1/15:
[...] A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
&(p->a) is equivalent to (int *)p. &(p->a) + 1 will be address of the element of the second struct. In this case, only one element is there, there will not be any padding in the structure so this will work but where there will be padding this code will break and leads to undefined behaviour.
The C99 standard states that:
When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object
Consider the following code:
struct test {
int x[5];
char something;
short y[5];
};
...
struct test s = { ... };
char *p = (char *) s.x;
char *q = (char *) s.y;
printf("%td\n", q - p);
This obviously breaks the above rule, since the p and q pointers are pointing to different "array objects", and, according to the rule, the q - p difference is undefined.
But in practice, why should such a thing ever result in undefined behaviour? After all, the struct members are laid out sequentially (just as array elements are), with any potential padding between the members. True, the amount of padding will vary across implementations and that would affect the outcome of the calculations, but why should that result be "undefined"?
My question is, can we suppose that the standard is just "ignorant" of this issue, or is there a good reason for not broadening this rule? Couldn't the above rule be rephrased to "both shall point to elements of the same array object or members of the same struct"?
My only suspicion are segmented memory architectures where the members might end up in different segments. Is that the case?
I also suspect that this is the reason why GCC defines its own __builtin_offsetof, in order to have a "standards compliant" definition of the offsetof macro.
EDIT:
As already pointed out, arithmetic on void pointers is not allowed by the standard. It is a GNU extension that fires a warning only when GCC is passed -std=c99 -pedantic. I'm replacing the void * pointers with char * pointers.
Subtraction and relational operators (on type char*) between addresses of member of the same struct are well defined.
Any object can be treated as an array of unsigned char.
Quoting N1570 6.2.6.1 paragraph 4:
Values stored in non-bit-field objects of any other object type
consist of n × CHAR_BIT bits, where n is the size of an object of that
type, in bytes. The value may be copied into an object of type
unsigned char [ n ] (e.g., by memcpy); the resulting set of bytes is
called the object representation of the value.
...
My only suspicion are segmented memory architectures where the members
might end up in different segments. Is that the case?
No. For a system with a segmented memory architecture, normally the compiler will impose a restriction that each object must fit into a single segment. Or it can permit objects that occupy multiple segments, but it still has to ensure that pointer arithmetic and comparisons work correctly.
Pointer arithmetic requires that the two pointers being added or subtracted to be part of the same object because it doesn't make sense otherwise.
The quoted section of standard specifically refers to two unrelated objects such as int a[b]; and int b[5]. The pointer arithmetic requires to know the type of the object that the pointers pointing to (I am sure you are aware of this already).
i.e.
int a[5];
int *p = &a[1]+1;
Here p is calculated by knowing the that the &a[1] refers to an int object and hence incremented to 4 bytes (assuming sizeof(int) is 4).
Coming to the struct example, I don't think it can possibly be defined in a way to make pointer arithmetic between struct members legal.
Let's take the example,
struct test {
int x[5];
char something;
short y[5];
};
Pointer arithmatic is not allowed with void pointers by C standard (Compiling with gcc -Wall -pedantic test.c would catch that). I think you are using gcc which assumes void* is similar to char* and allows it.
So,
printf("%zu\n", q - p);
is equivalent to
printf("%zu", (char*)q - (char*)p);
as pointer arithmetic is well defined if the pointers point to within the same object and are character pointers (char* or unsigned char*).
Using correct types, it would be:
struct test s = { ... };
int *p = s.x;
short *q = s.y;
printf("%td\n", q - p);
Now, how can q-p be performed? based on sizeof(int) or sizeof(short) ? How can the size of char something; that's in the middle of these two arrays be calculated?
That should explain it's not possible to perform pointer arithmetic on objects of different types.
Even if all members are of same type (thus no type issue as stated above), then it's better to use the standard macro offsetof (from <stddef.h>) to get the difference between struct members which has the similar effect as pointer arithmetic between members:
printf("%zu\n", offsetof(struct test, y) - offsetof(struct test, x));
So I see no necessity to define pointer arithmetic between struct members by the C standard.
Yes, you are allowed to perform pointer arithmetric on structure bytes:
N1570 - 6.3.2.3 Pointers p7:
... When a pointer to an object is converted to a pointer to a character type,
the result points to the lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining bytes of the object.
This means that for the programmer, bytes of the stucture shall be seen as a continuous area, regardless how it may have been implemented in the hardware.
Not with void* pointers though, that is non-standard compiler extension. As mentioned on paragraph from the standard, it applies only to character type pointers.
Edit:
As mafso pointed out in comments, above is only true as long as type of substraction result ptrdiff_t, has enough range for the result. Since range of size_t can be larger than ptrdiff_t, and if structure is big enough, it's possible that addresses are too far apart.
Because of this it's preferable to use offsetof macro on structure members and calculate result from those.
I believe the answer to this question is simpler than it appears, the OP asks:
but why should that result be "undefined"?
Well, let's see that the definition of undefined behavior is in the draft C99 standard section 3.4.3:
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements
it is simply behavior for which the standard does not impose a requirement, which perfectly fits this situation, the results are going to vary depending on the architecture and attempting to specify the results would have probably been difficult if not impossible in a portable manner. This leaves the question, why would they choose undefined behavior as opposed to let's say implementation of unspecified behavior?
Most likely it was made undefined behavior to limit the number of ways an invalid pointer could be created, this is consistent with the fact that we are provided with offsetof to remove the one potential need for pointer subtraction of unrelated objects.
Although the standard does not really define the term invalid pointer, we get a good description in Rationale for International Standard—Programming Languages—C which in section 6.3.2.3 Pointers says (emphasis mine):
Implicit in the Standard is the notion of invalid pointers. In
discussing pointers, the Standard typically refers to “a pointer to an
object” or “a pointer to a function” or “a null pointer.” A special
case in address arithmetic allows for a pointer to just past the end
of an array. Any other pointer is invalid.
The C99 rationale further adds:
Regardless how an invalid pointer is created, any use of it yields
undefined behavior. Even assignment, comparison with a null pointer
constant, or comparison with itself, might on some systems result in
an exception.
This strongly suggests to us that a pointer to padding would be an invalid pointer, although it is difficult to prove that padding is not an object, the definition of object says:
region of data storage in the execution environment, the contents of
which can represent values
and notes:
When referenced, an object may be interpreted as having a particular
type; see 6.3.2.1.
I don't see how we can reason about the type or the value of padding between elements of a struct and therefore they are not objects or at least is strongly indicates padding is not meant to be considered an object.
I should point out the following:
from the C99 standard, section 6.7.2.1:
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared. A pointer to a
structure object, suitably converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.
It isn't so much that the result of pointer subtraction between members is undefined so much as it is unreliable (i.e. not guaranteed to be the same between different instances of the same struct type when the same arithmetic is applied).
Due to the way how memory is layout for structs and its members, I was able to do the following:
typedef struct
{
int a;
int b;
int c;
} myStruct;
In main():
myStruct aStruct;
aStruct.a = 1;
aStruct.b = 2;
aStruct.c = 3;
int *aBadBoyPointer = &aStruct.a;
printf("%i %i %i", *(aBadBoyPointer), *(++aBadBoyPointer), *(++aBadBoyPointer));
Easy enough.
The above line causes a warning:
Unsequenced modification and access to aBadBoyPointer
But it compiles and runs fine printing out 1, 2, 3, respectively.
My question here:
For the sake of doing things correctly, could you cite a scenario where this would break miserably, a scenario that confirms the compiler that this is a bad practice / bad way of doing things?
Or: perhaps this is in fact a "a good way" to do things under some rare circumstances?
Addendum:
Aside from this part that cause Undefined Behavior:
printf("%i %i %i", *(aBadBoyPointer), *(++aBadBoyPointer), *(++aBadBoyPointer));
What I really like to know is:
Is it considered OK (a good practice) using a pointer pointing to one of the members of the struct but then gain access to other members within the struct this way (by incrementing, made possible due to the memory layout of the struct members) or is it a practice that is frowned upon by the Standard?
And secondly as mentioned above, if this is in deed a bad practice, would there ever be cases where using a pointer this way within a struct and then gain access to another member variable beneficial under certain circumstances?
There are a few problems, the warning you are seeing is due the unspecified order of evaluation of the argument to your function. The C99 draft standard section 6.5.2.2 Function calls paragraph 10 says:
The order of evaluation of the function designator, the actual arguments, and
subexpressions within the actual arguments is unspecified, but there is a sequence point
before the actual call.
you are also modifying a variable more than once within a sequence point which is undefined behavior and can not be relied on to work, section 6.5 Expressions paragraph 2 says (emphasis mine going forward):
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression.72) Furthermore, the prior value shall be read only to determine the value to be stored.73)
Also, note that the standard allows for padding between elements of a struct but beyond that scalar is considered an array of one element and so incrementing beyond the array and then performing indirection would be undefined as well. This is covered in section 6.5.6 Additive operators paragraph 7:
For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
and going more than one past the array bounds of accessing one past the array bounds is undefined by section 6.5.6 Additive operators paragraph 8:
[...]If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
We can see depending on the optimization level gcc will output (see it live):
3 3 2
or (see it live):
3 3 3
neither of which is the desired output.
The standards compliant way to access your structs members via a pointer would be to use offsetof and here, which requires including stddef.h. Accessing member a would look like this:
*( (int*) ((char*)aBadBoyPointer+offsetof(myStruct, a)) )
^ ^ ^
3 2 1
There are three elements here:
Use offsetof to determine the offset in bytes of the member
Cast to *char ** since we need pointer arithmetic in bytes
Cast back to *int ** since that is the correct type
I agree with the existing answers (such unsequenced access invokes Undefined Behavior and is bad).
However, to make a concrete example, I compiled your code in MS Visual Studio. The output is (in both Debug and Release mode):
3
3
3
In addition to what was already said in Shafik Yaghmour and nmichaels answers, you must also observe that some compilers will apply an alignment to the variables in the structure, usually by 4 bytes. For instance:
struct something {
char a;
char b;
};
This structure seems to have 2 bytes, but it may have 8 because the compiler may pad each element in the structure to make it cover a memory space divisible by 4. There will be 6 bytes that are just garbage, but they are still reserved. In such exemple, reading the structure as a sequence of char would fail.
Function arguments are not evaluated in an order determined by the C standard (C99 §6.5.2.2) therefore your code invokes undefined behavior. A different compiler or the same compiler on a different platform might give you different results. Under no circumstances is invoking undefined behavior a good way to do things.
For reference, the text of the standard says:
10 The order of evaluation of the function designator, the actual arguments, and
subexpressions within the actual arguments is unspecified, but there is a sequence point
before the actual call.
Addendum
To answer the second part of your question, the C compiler is free to add padding in between members of a struct per §6.7.2.1 paragraph 12:
Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.
There are cases where structures can behave like arrays, and incrementing a pointer to a member can work out for you (see #pragma pack and __attribute__((packed))) but your code will be decidedly (if explicitly) non-portable and you may run into some compiler bugs. In general, use arrays and enumerations for this sort of thing instead of structures.
The compiler is free to add padding between each of a struture's members. If it did the OP's code would fail miserably.
Also this might be undefined behaviour as one may not dereference a pointer which points out of an array's bounds.
However, whether one may consider
int a;
an array of 1 element I'm unsure.
The following compiles and prints "string" as an output.
#include <stdio.h>
struct S { int x; char c[7]; };
struct S bar() {
struct S s = {42, "string"};
return s;
}
int main()
{
printf("%s", bar().c);
}
Apparently this seems to invokes an undefined behavior according to
C99 6.5.2.2/5 If an attempt is made to modify the result of a function
call or to access it after the next sequence point, the behavior is
undefined.
I don't understand where it says about "next sequence point". What's going on here?
You've run into a subtle corner of the language.
An expression of array type is, in most contexts, implicitly converted to a pointer to the first element of the array object. The exceptions, none of which apply here, are:
When the array expression is the operand of a unary & operator (which yields the address of the entire array);
When it's the operand of a unary sizeof or (as of C11) _Alignof operator (sizeof arr yields the size of the array, not the size of a pointer); and
When it's a string literal in an initializer used to initialize an array object (char str[6] = "hello"; doesn't convert "hello" to a char*.)
(The N1570 draft incorrectly adds _Alignof to the list of exceptions. In fact, for reasons that are not clear, _Alignof can only be applied to a type name, not to an expression.)
Note that there's an implicit assumption: that the array expression refers to an array object in the first place. In most cases, it does (the simplest case is when the array expression is the name of a declared array object) -- but in this one case, there is no array object.
If a function returns a struct, the struct result is returned by value. In this case, the struct contains an array, giving us an array value with no corresponding array object, at least logically. So the array expression bar().c decays to a pointer to the first element of ... er, um, ... an array object that doesn't exist.
The 2011 ISO C standard addresses this by introducing "temporary lifetime", which applies only to "A non-lvalue expression with structure or union type, where the structure or union
contains a member with array type" (N1570 6.2.4p8). Such an object may not be modified, and its lifetime ends at the end of the containing full expression or full declarator.
So as of C2011, your program's behavior is well defined. The printf call gets a pointer to the first element of an array that's part of a struct object with temporary lifetime; that object continues to exist until the printf call finishes.
But as of C99, the behavior is undefined -- not necessarily because of the clause you quote (as far as I can tell, there is no intervening sequence point), but because C99 doesn't define the array object that would be necessary for the printf to work.
If your goal is to get this program to work, rather than to understand why it might fail, you can store the result of the function call in an explicit object:
const struct s result = bar();
printf("%s", result.c);
Now you have a struct object with automatic, rather than temporary, storage duration, so it exists during and after the execution of the printf call.
The sequence point occurs at the end of the full expression- i.e., when printf returns in this example. There are other cases where sequence points occur
Effectively, this rule states that function temporaries do not live beyond the next sequence point- which in this case, occurs well after it's use, so your program has quite well-defined behaviour.
Here's a simple example of not well-defined behaviour:
char* c = bar().c; *c = 5; // UB
Here, the sequence point is met after c is created, and the memory it points to is destroyed, but we then attempt to access c, resulting in UB.
In C99 there is a sequence point at the call to a function, after the arguments have been evaluated (C99 6.5.2.2/10).
So, when bar().c is evaluated, it results in a pointer to the first element in the char c[7] array in the struct returned by bar(). However, that pointer gets copied into an argument (a nameless argument as it happens) to printf(), and by the time the call is actually made to the printf() function the sequence point mentioned above has occurred, so the member that the pointer was pointing to may no longer be alive.
As Keith Thomson mentions, C11 (and C++) make stronger guarantees about the lifetime of temporaries, so the behavior under those standards would not be undefined.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Is the “struct hack” technically undefined behavior?
Normally accessing an array beyond its end is undefined behavior in C. For example:
int foo[1];
foo[5] = 1; //Undefined behavior
Is it still undefined behavior if I know that the memory area after the end of the array has been allocated, with malloc or on the stack? Here is an example:
#include <stdio.h>
#include <stdlib.h>
typedef struct
{
int len;
int data[1];
} MyStruct;
int main(void)
{
MyStruct *foo = malloc(sizeof(MyStruct) + sizeof(int) * 10);
foo->data[5] = 1;
}
I have seen this patten used in several places to make a variable length struct, and it seems to work in practice. Is it technically undefined behavior?
What you are describing is affectionately called "the struct hack". It's not clear if it's completely okay, but it was and is widely used.
As of late (C99), it has started to be replaced by the "flexible array member", where you're allowed to put an int data[]; field if it's the last field in the struct.
Under 6.5.6 Additive operators:
Semantics
8 - [...] If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. [...] If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
If the memory is allocated by malloc then:
7.22.3 Memory management functions
1 - [...] The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). The lifetime of an allocated object extends from the allocation until the deallocation.
This does not however countenance the use of such memory without an appropriate cast, so for MyStruct as defined above only the declared members of the object can be used. This is why flexible array members (6.7.2.1:18) were added.
Also note that appendix J.2 Undefined behavior calls out array access:
1 - The behavior is undefined in the following circumstances: [...]
— Addition or subtraction of a pointer into, or just beyond, an array object and an
integer type produces a result that does not point into, or just beyond, the same array
object.
— Addition or subtraction of a pointer into, or just beyond, an array object and an
integer type produces a result that points just beyond the array object and is used as
the operand of a unary * operator that is evaluated.
— An array subscript is out of range, even if an object is apparently accessible with the
given subscript (as in the lvalue expression a[1][7] given the declaration int
a[4][5]).
So, as you note this would be undefined behaviour:
MyStruct *foo = malloc(sizeof(MyStruct) + sizeof(int) * 10);
foo->data[5] = 1;
However, you would be allowed to do the following:
MyStruct *foo = malloc(sizeof(MyStruct) + sizeof(int) * 10);
((int *) foo)[(offsetof(MyStruct, data) / sizeof(int)) + 5] = 1;
C++ is laxer in this regard; 3.9.2 Compound types [basic.compound] has:
3 - [...] If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.
This makes sense considered in the light of C's more aggressive optimisation opportunities for pointers, e.g. with the restrict qualifier.
The C99 rationale document talks about this in section 6.7.2.1.
A new feature of C99: There is a common idiom known as the “struct hack” for creating a structure containing a variable-size array:
...
The validity of this construct has always been questionable. In the response to one Defect Report, the Committee decided that it was undefined behavior because the array p->items contains only one item, irrespective of whether the space exists. An alternative construct was
suggested: make the array size larger than the largest possible case (for example, using int items[INT_MAX];), but this approach is also undefined for other reasons.
The Committee felt that, although there was no way to implement the “struct hack” in C89, it was nonetheless a useful facility. Therefore the new feature of “flexible array members” was introduced. Apart from the empty brackets, and the removal of the “-1” in the malloc call,
this is used in the same way as the struct hack, but is now explicitly valid code.
The struct hack is undefined behavior, as supported not only be the C specification itself (I'm sure there are citations in the other answers) but the committee has even recorded its opinion.
So the answer is yes, it is undefined behavior according to the standard document, but it is well defined according to the de facto C standard. I imagine most compiler writers are intimately familiar with the hack. From GCC's tree-vrp.c:
/* Accesses after the end of arrays of size 0 (gcc
extension) and 1 are likely intentional ("struct
hack"). */
I think there's a good chance you might even find the struct hack in compiler test suites.