Due to the way how memory is layout for structs and its members, I was able to do the following:
typedef struct
{
int a;
int b;
int c;
} myStruct;
In main():
myStruct aStruct;
aStruct.a = 1;
aStruct.b = 2;
aStruct.c = 3;
int *aBadBoyPointer = &aStruct.a;
printf("%i %i %i", *(aBadBoyPointer), *(++aBadBoyPointer), *(++aBadBoyPointer));
Easy enough.
The above line causes a warning:
Unsequenced modification and access to aBadBoyPointer
But it compiles and runs fine printing out 1, 2, 3, respectively.
My question here:
For the sake of doing things correctly, could you cite a scenario where this would break miserably, a scenario that confirms the compiler that this is a bad practice / bad way of doing things?
Or: perhaps this is in fact a "a good way" to do things under some rare circumstances?
Addendum:
Aside from this part that cause Undefined Behavior:
printf("%i %i %i", *(aBadBoyPointer), *(++aBadBoyPointer), *(++aBadBoyPointer));
What I really like to know is:
Is it considered OK (a good practice) using a pointer pointing to one of the members of the struct but then gain access to other members within the struct this way (by incrementing, made possible due to the memory layout of the struct members) or is it a practice that is frowned upon by the Standard?
And secondly as mentioned above, if this is in deed a bad practice, would there ever be cases where using a pointer this way within a struct and then gain access to another member variable beneficial under certain circumstances?
There are a few problems, the warning you are seeing is due the unspecified order of evaluation of the argument to your function. The C99 draft standard section 6.5.2.2 Function calls paragraph 10 says:
The order of evaluation of the function designator, the actual arguments, and
subexpressions within the actual arguments is unspecified, but there is a sequence point
before the actual call.
you are also modifying a variable more than once within a sequence point which is undefined behavior and can not be relied on to work, section 6.5 Expressions paragraph 2 says (emphasis mine going forward):
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression.72) Furthermore, the prior value shall be read only to determine the value to be stored.73)
Also, note that the standard allows for padding between elements of a struct but beyond that scalar is considered an array of one element and so incrementing beyond the array and then performing indirection would be undefined as well. This is covered in section 6.5.6 Additive operators paragraph 7:
For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
and going more than one past the array bounds of accessing one past the array bounds is undefined by section 6.5.6 Additive operators paragraph 8:
[...]If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
We can see depending on the optimization level gcc will output (see it live):
3 3 2
or (see it live):
3 3 3
neither of which is the desired output.
The standards compliant way to access your structs members via a pointer would be to use offsetof and here, which requires including stddef.h. Accessing member a would look like this:
*( (int*) ((char*)aBadBoyPointer+offsetof(myStruct, a)) )
^ ^ ^
3 2 1
There are three elements here:
Use offsetof to determine the offset in bytes of the member
Cast to *char ** since we need pointer arithmetic in bytes
Cast back to *int ** since that is the correct type
I agree with the existing answers (such unsequenced access invokes Undefined Behavior and is bad).
However, to make a concrete example, I compiled your code in MS Visual Studio. The output is (in both Debug and Release mode):
3
3
3
In addition to what was already said in Shafik Yaghmour and nmichaels answers, you must also observe that some compilers will apply an alignment to the variables in the structure, usually by 4 bytes. For instance:
struct something {
char a;
char b;
};
This structure seems to have 2 bytes, but it may have 8 because the compiler may pad each element in the structure to make it cover a memory space divisible by 4. There will be 6 bytes that are just garbage, but they are still reserved. In such exemple, reading the structure as a sequence of char would fail.
Function arguments are not evaluated in an order determined by the C standard (C99 §6.5.2.2) therefore your code invokes undefined behavior. A different compiler or the same compiler on a different platform might give you different results. Under no circumstances is invoking undefined behavior a good way to do things.
For reference, the text of the standard says:
10 The order of evaluation of the function designator, the actual arguments, and
subexpressions within the actual arguments is unspecified, but there is a sequence point
before the actual call.
Addendum
To answer the second part of your question, the C compiler is free to add padding in between members of a struct per §6.7.2.1 paragraph 12:
Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.
There are cases where structures can behave like arrays, and incrementing a pointer to a member can work out for you (see #pragma pack and __attribute__((packed))) but your code will be decidedly (if explicitly) non-portable and you may run into some compiler bugs. In general, use arrays and enumerations for this sort of thing instead of structures.
The compiler is free to add padding between each of a struture's members. If it did the OP's code would fail miserably.
Also this might be undefined behaviour as one may not dereference a pointer which points out of an array's bounds.
However, whether one may consider
int a;
an array of 1 element I'm unsure.
Related
I noticed this warning from Clang:
warning: performing pointer arithmetic on a null pointer
has undefined behavior [-Wnull-pointer-arithmetic]
In details, it is this code which triggers this warning:
int *start = ((int*)0);
int *end = ((int*)0) + count;
The constant literal zero converted to any pointer type decays into the null pointer constant, which does not point to any contiguous area of memory but still has the type pointer to type needed to do pointer arithmetic.
Why would arithmetic on a null pointer be forbidden when doing the same on a non-null pointer obtained from an integer different than zero does not trigger any warning?
And more importantly, does the C standard explicitly forbid null pointer arithmetic?
Also, this code will not trigger the warning, but this is because the pointer is not evaluated at compile time:
int *start = ((int*)0);
int *end = start + count;
But a good way of avoiding the undefined behavior is to explicitly cast an integer value to the pointer:
int *end = (int *)(sizeof(int) * count);
The C standard does not allow it.
6.5.6 Additive operators (emphasis mine)
8 When an expression that has integer type is added to or
subtracted from a pointer, the result has the type of the pointer
operand. If the pointer operand points to an element of an array
object, and the array is large enough, the result points to an element
offset from the original element such that the difference of the
subscripts of the resulting and original array elements equals the
integer expression. In other words, if the expression P points to the
i-th element of an array object, the expressions (P)+N (equivalently,
N+(P)) and (P)-N (where N has the value n) point to, respectively, the
i+n-th and i-n-th elements of the array object, provided they exist.
Moreover, if the expression P points to the last element of an array
object, the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the last element
of an array object, the expression (Q)-1 points to the last element of
the array object. If both the pointer operand and the result point to
elements of the same array object, or one past the last element of the
array object, the evaluation shall not produce an overflow; otherwise,
the behavior is undefined. If the result points one past the last
element of the array object, it shall not be used as the operand of a
unary * operator that is evaluated.
For the purposes of the above, a pointer to a single object is considered as pointing into an array of 1 element.
Now, ((uint8_t*)0) does not point at an element of an array object. Simply because a pointer holding a null pointer value does not point at any object. Which is said at:
6.3.2.3 Pointers
3 If a null pointer constant is converted to a pointer type, the
resulting pointer, called a null pointer, is guaranteed to compare
unequal to a pointer to any object or function.
So you can't do arithmetic on it. The warning is justified, because as the second highlighted sentence mentions, we are in the case of undefined behavior.
Don't be fooled by the fact the offsetof macro is possibly implemented like that. The standard library is not bound by the constraints placed on user programs. It can employ deeper knowledge. But doing this in our code is not well defined.
When the C Standard was written, the vast majority of C implementations would, for any non-void* pointer value p, uphold the invariants that p+0 and p-0 both yield p, and p-p will yield zero. More generally, operations like a size-zero memcpy or fwrite that operate on a buffer of size N would ignore the buffer address when N was zero. Such behavior would allow programmers to avoid having to write code to handle corner cases. For example, code to output a packet with an optional payload passed via address and length arguments would naturally process (NULL,0) as an empty payload.
Nothing in the published Rationale for the C Standard suggests that implementations whose target platforms would naturally behave in such fashion shouldn't continue to work as they always had. There were, however, a few platforms where it may have been expensive to uphold such behavioral guarantees in cases where p is null.
As with most situations where the vast majority of C implementations would process a construct identically, but implementations might exist where such treatment would be impractical, the Standard characterizes the addition of zero to a null pointer as Undefined Behavior. The Standard allows implementations to, as a form of "conforming language extension", define the behavior of constructs in cases where it imposes no requirements, and it allow conforming (but not strictly conforming) programs to make use of them. According to the published Rationale, the stated intention was that support for such "popular extensions" be regarded as a "quality of implementation" issue to be decided by the marketplace. Implementations that could support them at essentially zero cost would do so, but implementations where such support would be expensive would be free to support such constructs or not based upon their customers' needs.
If one is using a compiler that targets commonplace platforms, and is designed to process the widest range of useful programs reasonably efficiently, then the extended semantics surrounding pointer arithmetic may allow one to write code more efficiently than would otherwise be possible. If one is targeting a compiler that does not value compatibility with quality compilers, however, one should recognize that it may treat the Standard's allowance for quirky hardware as an invitation to behave nonsensically even on commonplace hardware. Of course, one should also be aware that such compilers may behave nonsensically in corner cases where adherence with the Standard would require them to forego optimizations that are unsound but would "usually" be safe.
As mentioned here, here and here a function (in c99 or newer) defined this way
void func(int ptr[static 1]){
//do something with ptr, knowing that ptr != NULL
}
has one parameter (ptr) of type pointer to int and the compiler can assume that the function will never be called with null as argument. (e.g. the compiler can optimize null pointer checks away or warn if func is called with a nullpointer - and yes I know, that the compiler is not required to do any of that...)
C17 section 6.7.6.3 Function declarators (including prototypes) paragraph 7 says:
A declaration of a parameter as “array of type” shall be adjusted to “qualified pointer to type”, where
the type qualifiers (if any) are those specified within the [ and ] of the array type derivation. If the
keyword static also appears within the [ and ] of the array type derivation, then for each call to
the function, the value of the corresponding actual argument shall provide access to the first element
of an array with at least as many elements as specified by the size expression.
In case of the definition above the value of ptr has to provide access to the first element of an array with at least 1 element. It is therefore clear that the argument can never be null.
What I'm wandering is, whether it is valid to call such a function with the address of an int that is not part of an array. E.g. is this (given the definition of func above) technically valid or is it undefined behavior:
int var = 5;
func(&var);
I am aware that this will practically never be an issue, because no compiler I know of differentiates between a pointer to a member of an int array and a pointer to a local int variable. But given that a pointer in c (at least from the perspective of the standard) can be much more than just some integer with a special compile time type I wandered if there is some section in the standard, that makes this valid.
I do suspect, that it is actually not valid, as section 6.5.6 Additive operators paragraph 8 contains:
[...] If both the pointer operand and the result point
to elements of the same array object, or one past the last element of the array object, the evaluation
shall not produce an overflow; otherwise, the behavior is undefined. [...]
To me that sounds as if for any pointer that points to an array element adding 1 is a valid operation while it would be UB to add 1 to a pointer that points to a regular variable. That would mean, that there is indeed a difference between a pointer to an array element and a pointer to a normal variable, which would make the snippet above UB...
Section 6.5.6 Additive operators paragraph 7 contains:
For the purposes of these operators, a pointer to an object that is not an element of an array behaves
the same as a pointer to the first element of an array of length one with the type of the object as its
element type.
As the paragraph begins with "for the purposes of these operators" I suspect that there can be a difference in other contexts?
tl;dr;
Is there some section of the standard, that specifies, that there is no difference between a pointer to a regular variable of type T and a pointer to the element of an array of length one (array of type T[1])?
At face value, I think you have a point. We aren't really passing a pointer to the first element of an array. This may be UB if we consider the standard in a vacuum.
Other than the paragraph you quote in 6.5.6, there is no passage in the standard equating a single object to an array of one element. And there shouldn't be, since the two things are different. An array (of even one element) is implicitly converted to a pointer when appearing in most expressions. That's obviously not a property most object types posses.
The definition of the static keyword in [] mentions that the the pointer being passed, must be to the initial element of an array that contains at least a certain number of elements. There is another problem with the wording you cited, what about
int a[2];
func(a + 1);
Clearly the pointer being passed is not to the first element of an array. That is UB too if we take a literal interpretation of 6.7.6.3p7.
Putting the static keyword aside, when a function accepts a pointer to an object, whether the object is a member of an array (of any size) or not matters in only one context: pointer arithmetic.
In the absence of pointer arithmetic, there is no distinguishable difference in behavior when using a pointer to access an element of an array, or a standalone object.
I would argue that the intent behind 6.7.6.3p7 has pointer arithmetic in mind. And so the semantic being mentioned comes hand in hand with trying to do pointer arithmetic on the pointer being passed into the function.
The use of static 1 simply emerged naturally as useful idiom, and maybe wasn't the intent from get go. While the normative text may do with a slight correction, I think the intent behind it is clear. It isn't meant to be undefined behavior by the standard.
The authors of the Standard almost certainly intended that quality implementations would treat the value of a pointer to a non-array object in the same way as it would treat the value of a pointer to the first element of an array object of length 1. Had it merely said that a pointer to a non-array object was equivalent to a pointer to an array, however, that might have been misinterpreted as applying to all expressions that yield pointer values. This could cause problems given e.g. char a[1],*p=a;, because the expressions a and p both yield pointers of type char* with the same value, but sizeof p and sizeof a would likely yield different values.
The language was in wide use before the Standard was written, and it was hardly uncommon for programs to rely upon such behavior. Implementations that make a bona fide effort to behave in a fashion consistent with the Standard Committee's intentions as documented in the published Rationale document should thus be expected to process such code meaningfully without regard for whether a pedantic reading of the Standard would require it. Implementations that do not make such efforts, however, should not be trusted to process such code meaningfully.
I have the following example
#include <stdlib.h>
#include <stdio.h>
#include <stddef.h>
typedef struct test{
int a;
long b;
int c;
} test;
int main()
{
test *t = (test*) malloc(offsetof(test, c));
t -> b = 100;
}
It works fine, but Im not sure about it. I think I have UB here. We have a pointer to an object of a structure type. But the object of the structure type is not really valid.
I went through the standard and could not find any definition of this behavior. The only section I could find close to this one is 6.5.3.2:
If an invalid value has been assigned to the pointer, the behavior of
the unary * operator is undefined
But this is not really relevant since the pointer returned by malloc is completely valid.
Is there a reference in the standard explaining such a behavior? I'm using C11 N1570.
From C2011, paragraph 6.2.6.1/4:
Values stored in non-bit-field objects of any other object type consist of n x CHAR_BIT bits, where n is the size of an object of that type, in bytes.
Therefore, since the allocated object in your code is smaller than the size of a struct test, it cannot contain a value of an object of that type.
Now consider your expression t -> b = 100. C2011, paragraph 6.5.2.3/4 defines the behavior of the -> operator:
A postfix expression followed by the -> operator and an identifier designates a member of a structure or union object. The value is that of the named member of the object to which the first expression points [...].
(Emphasis added.) We've established that your t does not (indeed, cannot) point to a struct test, however, so the best we can say about 6.5.2.3/4 is that it does not apply to your case. There being no other definition of the behavior of the -> operator, we are left with paragraph 4/2 (emphasis added):
If a ''shall'' or ''shall not'' requirement that appears outside of a constraint or runtime- constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ''undefined behavior'' or by the omission of any explicit definition of behavior.
So there you are. The behavior of your code is undefined.
since the pointer returned by malloc is completely valid.
No the pointer is not "completely valid". Not at all.
Why do you think the pointer is "completely valid"? You didn't allocate enough bytes to hold an entire struct test - the pointer is not "completely valid" as there isn't a valid struct test object for you to access.
There's no such thing as a partial object in C. That's why you can't find it in the C standard.
It works fine
No, it doesn't.
"I didn't observe it blowing up." is not the same as "It works fine."
Your code doesn't do anything observable. Per the as-if rule the compiler is free to elide the entire thing and just return zero from main().
The C99 standard states that:
When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object
Consider the following code:
struct test {
int x[5];
char something;
short y[5];
};
...
struct test s = { ... };
char *p = (char *) s.x;
char *q = (char *) s.y;
printf("%td\n", q - p);
This obviously breaks the above rule, since the p and q pointers are pointing to different "array objects", and, according to the rule, the q - p difference is undefined.
But in practice, why should such a thing ever result in undefined behaviour? After all, the struct members are laid out sequentially (just as array elements are), with any potential padding between the members. True, the amount of padding will vary across implementations and that would affect the outcome of the calculations, but why should that result be "undefined"?
My question is, can we suppose that the standard is just "ignorant" of this issue, or is there a good reason for not broadening this rule? Couldn't the above rule be rephrased to "both shall point to elements of the same array object or members of the same struct"?
My only suspicion are segmented memory architectures where the members might end up in different segments. Is that the case?
I also suspect that this is the reason why GCC defines its own __builtin_offsetof, in order to have a "standards compliant" definition of the offsetof macro.
EDIT:
As already pointed out, arithmetic on void pointers is not allowed by the standard. It is a GNU extension that fires a warning only when GCC is passed -std=c99 -pedantic. I'm replacing the void * pointers with char * pointers.
Subtraction and relational operators (on type char*) between addresses of member of the same struct are well defined.
Any object can be treated as an array of unsigned char.
Quoting N1570 6.2.6.1 paragraph 4:
Values stored in non-bit-field objects of any other object type
consist of n × CHAR_BIT bits, where n is the size of an object of that
type, in bytes. The value may be copied into an object of type
unsigned char [ n ] (e.g., by memcpy); the resulting set of bytes is
called the object representation of the value.
...
My only suspicion are segmented memory architectures where the members
might end up in different segments. Is that the case?
No. For a system with a segmented memory architecture, normally the compiler will impose a restriction that each object must fit into a single segment. Or it can permit objects that occupy multiple segments, but it still has to ensure that pointer arithmetic and comparisons work correctly.
Pointer arithmetic requires that the two pointers being added or subtracted to be part of the same object because it doesn't make sense otherwise.
The quoted section of standard specifically refers to two unrelated objects such as int a[b]; and int b[5]. The pointer arithmetic requires to know the type of the object that the pointers pointing to (I am sure you are aware of this already).
i.e.
int a[5];
int *p = &a[1]+1;
Here p is calculated by knowing the that the &a[1] refers to an int object and hence incremented to 4 bytes (assuming sizeof(int) is 4).
Coming to the struct example, I don't think it can possibly be defined in a way to make pointer arithmetic between struct members legal.
Let's take the example,
struct test {
int x[5];
char something;
short y[5];
};
Pointer arithmatic is not allowed with void pointers by C standard (Compiling with gcc -Wall -pedantic test.c would catch that). I think you are using gcc which assumes void* is similar to char* and allows it.
So,
printf("%zu\n", q - p);
is equivalent to
printf("%zu", (char*)q - (char*)p);
as pointer arithmetic is well defined if the pointers point to within the same object and are character pointers (char* or unsigned char*).
Using correct types, it would be:
struct test s = { ... };
int *p = s.x;
short *q = s.y;
printf("%td\n", q - p);
Now, how can q-p be performed? based on sizeof(int) or sizeof(short) ? How can the size of char something; that's in the middle of these two arrays be calculated?
That should explain it's not possible to perform pointer arithmetic on objects of different types.
Even if all members are of same type (thus no type issue as stated above), then it's better to use the standard macro offsetof (from <stddef.h>) to get the difference between struct members which has the similar effect as pointer arithmetic between members:
printf("%zu\n", offsetof(struct test, y) - offsetof(struct test, x));
So I see no necessity to define pointer arithmetic between struct members by the C standard.
Yes, you are allowed to perform pointer arithmetric on structure bytes:
N1570 - 6.3.2.3 Pointers p7:
... When a pointer to an object is converted to a pointer to a character type,
the result points to the lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining bytes of the object.
This means that for the programmer, bytes of the stucture shall be seen as a continuous area, regardless how it may have been implemented in the hardware.
Not with void* pointers though, that is non-standard compiler extension. As mentioned on paragraph from the standard, it applies only to character type pointers.
Edit:
As mafso pointed out in comments, above is only true as long as type of substraction result ptrdiff_t, has enough range for the result. Since range of size_t can be larger than ptrdiff_t, and if structure is big enough, it's possible that addresses are too far apart.
Because of this it's preferable to use offsetof macro on structure members and calculate result from those.
I believe the answer to this question is simpler than it appears, the OP asks:
but why should that result be "undefined"?
Well, let's see that the definition of undefined behavior is in the draft C99 standard section 3.4.3:
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements
it is simply behavior for which the standard does not impose a requirement, which perfectly fits this situation, the results are going to vary depending on the architecture and attempting to specify the results would have probably been difficult if not impossible in a portable manner. This leaves the question, why would they choose undefined behavior as opposed to let's say implementation of unspecified behavior?
Most likely it was made undefined behavior to limit the number of ways an invalid pointer could be created, this is consistent with the fact that we are provided with offsetof to remove the one potential need for pointer subtraction of unrelated objects.
Although the standard does not really define the term invalid pointer, we get a good description in Rationale for International Standard—Programming Languages—C which in section 6.3.2.3 Pointers says (emphasis mine):
Implicit in the Standard is the notion of invalid pointers. In
discussing pointers, the Standard typically refers to “a pointer to an
object” or “a pointer to a function” or “a null pointer.” A special
case in address arithmetic allows for a pointer to just past the end
of an array. Any other pointer is invalid.
The C99 rationale further adds:
Regardless how an invalid pointer is created, any use of it yields
undefined behavior. Even assignment, comparison with a null pointer
constant, or comparison with itself, might on some systems result in
an exception.
This strongly suggests to us that a pointer to padding would be an invalid pointer, although it is difficult to prove that padding is not an object, the definition of object says:
region of data storage in the execution environment, the contents of
which can represent values
and notes:
When referenced, an object may be interpreted as having a particular
type; see 6.3.2.1.
I don't see how we can reason about the type or the value of padding between elements of a struct and therefore they are not objects or at least is strongly indicates padding is not meant to be considered an object.
I should point out the following:
from the C99 standard, section 6.7.2.1:
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared. A pointer to a
structure object, suitably converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.
It isn't so much that the result of pointer subtraction between members is undefined so much as it is unreliable (i.e. not guaranteed to be the same between different instances of the same struct type when the same arithmetic is applied).
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Is the “struct hack” technically undefined behavior?
Normally accessing an array beyond its end is undefined behavior in C. For example:
int foo[1];
foo[5] = 1; //Undefined behavior
Is it still undefined behavior if I know that the memory area after the end of the array has been allocated, with malloc or on the stack? Here is an example:
#include <stdio.h>
#include <stdlib.h>
typedef struct
{
int len;
int data[1];
} MyStruct;
int main(void)
{
MyStruct *foo = malloc(sizeof(MyStruct) + sizeof(int) * 10);
foo->data[5] = 1;
}
I have seen this patten used in several places to make a variable length struct, and it seems to work in practice. Is it technically undefined behavior?
What you are describing is affectionately called "the struct hack". It's not clear if it's completely okay, but it was and is widely used.
As of late (C99), it has started to be replaced by the "flexible array member", where you're allowed to put an int data[]; field if it's the last field in the struct.
Under 6.5.6 Additive operators:
Semantics
8 - [...] If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. [...] If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
If the memory is allocated by malloc then:
7.22.3 Memory management functions
1 - [...] The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). The lifetime of an allocated object extends from the allocation until the deallocation.
This does not however countenance the use of such memory without an appropriate cast, so for MyStruct as defined above only the declared members of the object can be used. This is why flexible array members (6.7.2.1:18) were added.
Also note that appendix J.2 Undefined behavior calls out array access:
1 - The behavior is undefined in the following circumstances: [...]
— Addition or subtraction of a pointer into, or just beyond, an array object and an
integer type produces a result that does not point into, or just beyond, the same array
object.
— Addition or subtraction of a pointer into, or just beyond, an array object and an
integer type produces a result that points just beyond the array object and is used as
the operand of a unary * operator that is evaluated.
— An array subscript is out of range, even if an object is apparently accessible with the
given subscript (as in the lvalue expression a[1][7] given the declaration int
a[4][5]).
So, as you note this would be undefined behaviour:
MyStruct *foo = malloc(sizeof(MyStruct) + sizeof(int) * 10);
foo->data[5] = 1;
However, you would be allowed to do the following:
MyStruct *foo = malloc(sizeof(MyStruct) + sizeof(int) * 10);
((int *) foo)[(offsetof(MyStruct, data) / sizeof(int)) + 5] = 1;
C++ is laxer in this regard; 3.9.2 Compound types [basic.compound] has:
3 - [...] If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.
This makes sense considered in the light of C's more aggressive optimisation opportunities for pointers, e.g. with the restrict qualifier.
The C99 rationale document talks about this in section 6.7.2.1.
A new feature of C99: There is a common idiom known as the “struct hack” for creating a structure containing a variable-size array:
...
The validity of this construct has always been questionable. In the response to one Defect Report, the Committee decided that it was undefined behavior because the array p->items contains only one item, irrespective of whether the space exists. An alternative construct was
suggested: make the array size larger than the largest possible case (for example, using int items[INT_MAX];), but this approach is also undefined for other reasons.
The Committee felt that, although there was no way to implement the “struct hack” in C89, it was nonetheless a useful facility. Therefore the new feature of “flexible array members” was introduced. Apart from the empty brackets, and the removal of the “-1” in the malloc call,
this is used in the same way as the struct hack, but is now explicitly valid code.
The struct hack is undefined behavior, as supported not only be the C specification itself (I'm sure there are citations in the other answers) but the committee has even recorded its opinion.
So the answer is yes, it is undefined behavior according to the standard document, but it is well defined according to the de facto C standard. I imagine most compiler writers are intimately familiar with the hack. From GCC's tree-vrp.c:
/* Accesses after the end of arrays of size 0 (gcc
extension) and 1 are likely intentional ("struct
hack"). */
I think there's a good chance you might even find the struct hack in compiler test suites.