Pointer aliasing between struct and first member of struct [duplicate] - c

This question already has an answer here:
Struct Extension in C
(1 answer)
Closed 2 years ago.
Pointer aliasing in C is normally undefined behavior (because of strict aliasing), but C11 standard seems allow aliasing a pointer to struct and a pointer to the first member of the struct
C11 6.7.2.1 (15)...A pointer to a structure object... points to its initial member... and vice versa...
So does the following code contain undefined behavior?
struct Foo {
int x;
int y;
};
// does foe return always 100?
int foe() {
struct Foo foo = { .x = 10, .y = 20 }, *pfoo = &foo;
int *px = (int*)pfoo; *px = 100;
return pfoo->x;
}

This code is correct. All versions of Standard C and C++ allow this , although the wording varies.
There's no strict aliasing issue because you access an object of type int via an lvalue of type int. The strict aliasing rule may apply when the lvalue doing the access has a different type to the object stored at the memory location .
The text you quoted covers that the pointer cast actually points to the int object.

The way the Standard is written, an lvalue of a structure or union type may be used to access an object of member type, but there is no provision that would allow an arbitrary lvalue of struct or union's member type to access an object of the struct or union type. Because it would of course be absurd to say that code couldn't use a struct or union member lvalue (which would of course have that member's type) to access a struct or union, all compilers have supported some common access patterns. Because compilers allow such accesses under different circumstances, however, the Standard treats all support for such accesses as a Quality of Implementation issue rather than trying to specify exactly when such support is required.
The approach most consistent with the Standard's wording, and which would allow the most useful optimizations, while also supporting most code that would need to perform type punning or other techniques, would be to say that for purposes of N1570 6.5p7, a pointer which is visibly derived from a pointer or lvalue of a given type may be used within the context of such derivation to access things that would (for purposes of 6.5p7) be accessible using an lvalue of that type. Under such an approach, given a piece of code like:
struct foo { int index,len; int *dat; };
void test1(struct foo *p)
{
int *pp = &foo->len;
*pp = 4;
}
void test2(struct foo *p, int dat)
{
if (p->index < p->len)
{
p->dat[p->index] = dat;
p->index++;
}
}
should recognize that within test1, an access to *pp may access the struct foo object *p, since pp is visibly formed from p. On the other hand, the compiler would not be required to accommodate within test2 the possibility that an object of type struct foo, nor members thereof such as p->index, might be modified through the pointer p->dat, because nothing within test2 would cause the address of a struct foo or any portion thereof to be stored in p->dat.
Clang and gcc, however, instead opt for a different approach, behaving as though 6.5p7 allows struct members to be accessed via arbitrary pointers of their types, but union members can't be accessed via pointers at all, excluding the pointer arithmetic implied by bracketed array expressions. Given union { uint16_t h[4]; uint32_t w[2];} u; clang and gcc will recognize that an access to u.h[i] might interact with u.w[j], but will not recognize that *(u.h+i) might interact with *(u.w+j) even though the Standard defines the meaning of the former expressions with brackets as being equivalent to the latter forms.
Given that compilers consistently handle all of these constructs usefully when type-based aliasing is disabled. The Standard, however, doesn't impose any requirements even in many common cases, and clang and gcc make no promises about behavior of constructs not mandated by the Standard, even if all versions to date have handled such constructs usefully. Thus, I would not recommend relying upon clang or gcc to usefully process anything that involves accessing storage as different types at different times except when using -fno-strict-aliasing, and their wackiness isn't an issue when using that option, so I'd recommend simply using that option unless or until clang and gcc adopt a better defined abstraction.

Related

C: Is accessing initial member of nested struct using pointer cast to "outer" struct type defined? [duplicate]

This question already has answers here:
Are C-structs with the same members types guaranteed to have the same layout in memory?
(4 answers)
Closed 1 year ago.
I'm trying to understand the so-called "common initial sequence" rule for C aliasing analysis. This question does not concern C++.
Specifically, according to resources (for example the CPython PEP 3123),
[A] value of a struct type may also be accessed through a pointer to the first field. E.g. if a struct starts with an int, the struct * may also be cast to an int *, allowing to write int values into the first field.
(emphasis mine).
My question can be roughly phrased as "does the ability to access a struct by pointer to first-member-type pierce nested structs?" That is, what happens if access is via a pointer whose pointed-to type (let's say type struct A) isn't exactly the same type as that of the first member (let's say type struct B), but that pointed-to type (struct A) has common first initial sequence with struct B, and the "underlying" access is only done to that common initial sequence?
(I'm chiefly interested in structs, but I can imagine this question may also pertain to unions, although I imagine unions come with their own tricky bits w.r.t. aliasing.)
This phrasing may not clear, so I tried to illustrate my intention with the code as follows (also available at godbolt.org, and the code seem to compile just fine with the intended effect):
/* Base object as first member of extension types. */
struct base {
unsigned int flags;
};
/* Types extending the "base" by including it as first member */
struct file_object {
struct base attr;
int index;
unsigned int size;
};
struct socket_object {
struct base attr;
int id;
int type;
int status;
};
/* Another base-type with an additional member, but the first member is
* compatible with that of "struct base" */
struct extended_base {
unsigned int flags;
unsigned int mode;
};
/* A type that derives from extended_base */
struct extended_socket_object {
struct extended_base e_attr; /* Using "extended" base here */
int e_id;
int e_type;
int e_status;
int some_other_field;
};
/* Function intended for structs "deriving from struct base" */
unsigned int set_flag(struct base *objattr, unsigned int flag)
{
objattr->flags |= flag;
return objattr->flags;
}
extern struct file_object *file;
extern struct socket_object *sock;
extern struct extended_socket_object *esock;
void access_files(void)
{
/* Cast to pointer-to-first-member-type and use it */
set_flag((struct base *)file, 1);
set_flag((struct base *)sock, 1);
/* Question: is the following access defined?
* Notice that it's cast to (struct base *), rather than
* (struct extended_base *), although the two structs share the same common
* initial member and it is this member that's actually accessed. */
set_flag((struct base *)esock, 1);
return;
}
This is not safe as you're attempting to access an object of type struct extended_base as though it were an object of type struct base.
However, there are rules that allow access to two structures initial common sequence via a union. From section 6.5.2.3p6 of the C standard:
One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members
So if you change the definition of struct extended_socket_object to this:
struct extended_socket_object {
union u_base {
struct base b_attr;
struct extended_base e_attr;
};
int e_id;
int e_type;
int e_status;
int some_other_field;
};
Then a struct extended_socket_object * may be converted to union u_base * which may in turn be converted to a struct base *. This is allowed as per section 6.7.2.1 p15 and p16:
15 Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase
in the order in which they are declared. A pointer to a structure
object, suitably converted, points to its initial member (or
if that member is a bit-field, then to the unit in which it
resides), and vice versa. There may be unnamed padding within
a structure object, but not at its beginning.
16 The size of a union is sufficient to contain the largest of its members. The value of at most one of the
members can be stored in a union object at any time. A
pointer to a union object, suitably converted, points to each
of its members (or if a member is a bit-field, then to the
unit in which it resides), and vice versa.
It is then allowed to access b_attr->flags because of the union it resides in via 6.5.2.3p6.
According to the C Standard (6.7.2.1 Structure and union specifiers, paragraph 13):
A pointer to a structure object, suitably converted, points to its
initial member (or if that member is a bit-field, then to the unit in
which it resides), and vice versa.
So, converting esock to struct extended_base * and then converting it to unsigned int * must give us a pointer to the flags field, according to the Standard.
I'm not sure if converting to to struct base * counts as "suitably converted" or not. My guess is that it would work at any machine you will try it on, but I wouldn't recommend it.
I think it would be safest (and also make the code more clear) if you simply keep a member of type struct base inside struct extended_base (instead of the member of type unsigned int). After doing that, you have two options:
When you want to send it to a function, write explicitly: esock->e_attr.base (instead of (struct base *)esock). This is what I would recommend.
You can also write: (struct base *) (struct extended_base *) esock which is guaranteed to work, but I think it is less clear, and also more dangerous (if in the future you will want to add or accidentaly add another member in the beginning of the struct).
After reading up into the standard's text following the other answers (thanks!!) I think I may try to answer my own question (which was a bit misleading to begin with, see below)
As the other answers pointed out, there appear to be two somewhat overlapping concerns in this question -
"common initial sequence" -- in the standard documents this specifically refers to the context of a union having several structs as member and when these member structs share some compatible members beginning from the first. (§6.5.2.3 " Structure and union members", p6 -- Thanks, #dbush!).
My reading: the language spec suggests that, if at the site of access to these "apparently" different structs it is made clear that they actually belong to the same union, and that the access is done through the union, it is permitted; otherwise, it is not.
I think the requirement is meant to work with type-based aliasing rules: if these structs do indeed alias each other, this fact must be made clear at compile time (by involving the union). When the compiler sees pointers to different types of structs, it can't, in the most general case, deduce whether they may have belonged to some union somewhere. In that case, if it invokes type-based alias analysis, the code will be miscompiled. So the standard requires that the union is made visible.
"a pointer (to struct), when suitably converted, points to its initial member" (§6.7.2.1 "Structure and union specifiers", p15) -- this sounds tantalizingly close to 1., but it's less about aliasing than about a) the implementation requirements for struct and b) "suitable conversion" of pointers. (Thanks, #Orielno!)
My reading: the "suitable conversion" appears to mean "see everything else in the standard", that is, no matter if the "conversion" is performed by type cast or assignment (or a series of them), being "suitable" suggests "all constraints must be satisfied at all steps". The "initial-member" rule, I think, simply says that the actual location of the struct is exactly the same as the initial member: there cannot be padding in front of the first member (this is explicitly stated in the same paragraph).
But no matter how we make use of this fact to convert pointers, the code must still be subject to constraints governing conversion, because a pointer is not just a machine representation of some location -- its value still has to be correctly interpreted in the context of types. A counterexample would be a conversion involving an assignment that discards const from the pointed-to type: this violates a constraint and cannot be suitable.
The somewhat misleading thing in my original post was to suggest that rule 2 had something to do with "common initial sequence", where it is not directly related to that concept.
So for my own question, I tend to answer, to my own surprise, "yes, it is valid". The reason is that the pointer conversion by cast in expression (struct base *)esock is "legal in the letter of the law" -- the standard simply says that (§6.5.4 "Cast operators", p3)
Conversions that involve pointers, other than where permitted by the constraints of 6.5.16.1 (note: constraints governing simple assignment), shall be specified by means of an explicit cast.
Since the expression is indeed an explicit cast, in and by itself it doesn't contradict the standard. The "conversion" is "suitable". Further function call to set_flag() correctly dereferences the pointer by virtue of the suitable conversion.
But! Indeed the "common initial sequence" becomes important when we want to improve the code. For example, in #dbush's answer, if we want to "inherit from multiple bases" via union, we must make sure that access to base is done where it's apparent that the struct is a member of the union. Also, as #Orielno pointed out, when the code makes us worry about its validity, perhaps switching to an explicitly safe alternative is better even if the code is valid in the first place.
In the language the C Standard was written to describe, an lvalue of the form ptr->memberName would use ptr's type to select a namespace in which to look up memberName, add the offset of that member to the address in ptr, and then access an object of that member type at that address. Once the address and type of the member were determined, the original structure object would play no further rule in the processing of the expression.
When C99 was being written, there was a desire to avoid requiring that a compiler given something like:
struct position {double x,y,z; };
struct velocity {double dx,dy,dz; };
void update_positions(struct positions *pp, struct velocity *vv, int count)
{
for (int i=0; i<count; i++)
{
positions[i].x += vv->dx;
positions[i].y += vv->dy;
positions[i].z += vv->dz;
}
}
must allow for the possibility that a write to e.g. positions[i].y might affect the object of vv->dy even when there is no evidence of any relationship between any object of type struct position and any object of type struct velocity. The Committee agreed that compilers shouldn't be required to accommodate interactions between different structure types in such cases.
I don't think anyone would have seriously disputed the notion that in situations where storage is accessed using a pointer which is freshly and visibly converted from one structure type to another, a quality compiler should accommodate the possibility that the operation might access a structure of the original type. The question of exactly when an implementation would accommodate such possibilities should depend upon what its customers were expecting to do, and was thus left as a quality-of-implementation issue outside the Standard's jurisdiction. The Standard wouldn't forbid implementations from being willfully blind to even the most obvious cases, but that's because the dumber something would be, the less need there should be to prohibit it.
Unfortunately, the authors of clang and gcc have misinterpreted the Standard's failure to forbid them from being obtusely blind to the possibility that a freshly-type-converted pointer might be used to access the same object as a pointer of the original type, as an invitation to behave in such fashion. When using clang or gcc to process any code which would need to make use of the Common Initial Sequence guarantees, one must use -fno-strict-aliasing. When using optimization without that flag, both clang nor gcc are prone to behave in ways inconsistent with any plausible interpretation of the Standard's intent. Whether one views such behaviors as being a result of a really weird interpretation of the Standard, or simply as bugs, I see no reason to expect that gcc or clang will ever behave meaningfully in such cases.

Is it legal to access struct members via offset pointers from other struct members?

In these two examples, does accessing members of the struct by offsetting pointers from other members result in Undefined / Unspecified / Implementation Defined Behavior?
struct {
int a;
int b;
} foo1 = {0, 0};
(&foo1.a)[1] = 1;
printf("%d", foo1.b);
struct {
int arr[1];
int b;
} foo2 = {{0}, 0};
foo2.arr[1] = 1;
printf("%d", foo2.b);
Paragraph 14 of C11 § 6.7.2.1 seems to indicate that this should be implementation-defined:
Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.
and later goes on to say:
There may be unnamed padding within a structure object, but not at its beginning.
However, code like the following appears to be fairly common:
union {
int arr[2];
struct {
int a;
int b;
};
} foo3 = {{0, 0}};
foo3.arr[1] = 1;
printf("%d", foo3.b);
(&foo3.a)[1] = 2; // appears to be illegal despite foo3.arr == &foo3.a
printf("%d", foo3.b);
The standard appears to guarantee that foo3.arr is the same as &foo3.a, and it doesn't make sense that referring to it one way is legal and the other not, but equally it doesn't make sense that adding the outer union with the array should suddenly make (&foo3.a)[1] legal.
My reasoning for thinking the first examples must also therefore be legal:
foo3.arr is guaranteed to be the same as &foo.a
foo3.arr + 1 and &foo3.b point to the same memory location
&foo3.a + 1 and &foo3.b must therefore point to the same memory location (from 1 and 2)
struct layouts are required to be consistent, so &foo1.a and &foo1.b should be laid out exactly the same as &foo3.a and &foo3.b
&foo1.a + 1 and &foo1.b must therefore point to the same memory location (from 3 and 4)
I've come across some outside sources that suggest that both the foo3.arr[1] and (&foo3.a)[1] examples are illegal, however I haven't been able to find a concrete statement in the standard that would make it so.
Even if they were both illegal though, it's also possible to construct the same scenario with flexible array pointers which, as far as I can tell, does have standard-defined behavior.
union {
struct {
int x;
int arr[];
};
struct {
int y;
int a;
int b;
};
} foo4;
The original application is considering whether or not a buffer overflow from one struct field into another is strictly speaking defined by the standard:
struct {
char buffer[8];
char overflow[8];
} buf;
strcpy(buf.buffer, "Hello world!");
println(buf.overflow);
I would expect this to output "rld!" on nearly any real-world compiler, but is this behavior guaranteed by the standard, or is it an undefined or implementation-defined behavior?
Introduction: The standard is inadequate in this area, and there is decades of history of argument on this topic and strict aliasing with no convincing resolution or proposal to fix.
This answer reflects my view rather than any imposition of the Standard.
Firstly: it's generally agreed that the code in your first code sample is undefined behaviour due to accessing outside the bounds of an array via direct pointer arithmetic.
The rule is C11 6.5.6/8 . It says that indexing from a pointer must remain within "the array object" (or one past the end). It doesn't say which array object but it is generally agreed that in the case int *p = &foo.a; then "the array object" is foo.a, and not any larger object of which foo.a is a subobject.
Relevant links:
one, two.
Secondly: it's generally agreed that both of your union examples are correct. The standard explicitly says that any member of a union may be read; and whatever the contents of the relevant memory location are are interpreted as the type of the union member being read.
You suggest that the union being correct implies that the first code should be correct too, but it does not. The issue is not with specifying the memory location read; the issue is with how we arrived at the expression specifying that memory location.
Even though we know that &foo.a + 1 and &foo.b are the same memory address, it's valid to access an int through the second and not valid to access an int through the first.
It's generally agreed that you can access the int by computing its address in other ways that don't break the 6.5.6/8 rule, e.g.:
((int *)((char *)&foo + offsetof(foo, b))[0]
or
((int *)((uintptr_t)&foo.a + sizeof(int)))[0]
Relevant links: one, two
It's not generally agreed on whether ((int *)&foo)[1] is valid. Some say it's basically the same as your first code, since the standard says "a pointer to an object, suitably converted, points to the element's first object". Others say it's basically the same as my (char *) example above because it follows from the specification of pointer casting. A few even claim it's a strict aliasing violation because it aliases a struct as an array.
Maybe relevant is N2090 - Pointer provenance proposal. This does not directly address the issue, and doesn't propose a repeal of 6.5.6/8.
According to C11 draft N1570 6.5p7, an attempt to access the stored value of a struct or union object using anything other than an lvalue of character type, the struct or union type, or a containing struct or union type, invokes UB even if behavior would otherwise be fully described by other parts of the Standard. This section contains no provision that would allow an lvalue of a non-character member type (or any non-character numeric type, for that matter) to be used to access the stored value of a struct or union.
According to the published Rationale document, however, the authors of the Standard recognized that different implementations offered different behavioral guarantees in cases where the Standard imposed no requirements, and regarded such "popular extensions" as a good and useful thing. They judged that questions of when and how such extensions should be supported would be better answered by the marketplace than by the Committee. While it may seem weird that the Standard would allow an obtuse compiler to ignore the possibility that someStruct.array[i] might affect the stored value of someStruct, the authors of the Standard recognized that any compiler whose authors aren't deliberately obtuse will support such a construct whether the Standard mandates or not, and that any attempt to mandate any kind of useful behavior from obtusely-designed compilers would be futile.
Thus, a compiler's level of support for essentially anything having to do with structures or unions is a quality-of-implementation issue. Compiler writers who are focused on being compatible with a wide range of programs will support a wide range of constructs. Those which are focused on maximizing the performance of code that needs only those constructs without which the language would be totally useless, will support a much narrower set. The Standard, however, is devoid of guidance on such issues.
PS--Compilers that are configured to be compatible with MSVC-style volatile semantics will interpret that qualifier as a indicating that an access to the pointer may have side-effects that interact with objects whose address has been taken and that aren't guarded by restrict, whether or not there is any other reason to expect such a possibility. Use of such a qualifier when accessing storage in "unusual" ways may make it more obvious to human readers that the code is doing something "weird" at the same time as it will thus ensure compatibility with any compiler that uses such semantics, even if such compiler would not otherwise recognize that access pattern. Unfortunately, some compiler writers refuse to support such semantics at anything other than optimization level 0 except with programs that demand it using non-standard syntax.

Does access through pointer change strict aliasing semantics?

With these definitions:
struct My_Header { uintptr_t bits; }
struct Foo_Type { struct My_Header header; int x; }
struct Foo_Type *foo = ...;
struct Bar_Type { struct My_Header header; float x; }
struct Bar_Type *bar = ...;
Is it correct to say that this C code ("case one"):
foo->header.bits = 1020;
...is actually different semantically from this code ("case two"):
struct My_Header *alias = &foo->header;
alias->bits = 1020;
My understanding is that they should be different:
Case One considers the assignment unable to affect the header in a Bar_Type. It only is seen as being able to influence the header in other Foo_Type instances.
Case Two, by forcing access through a generic aliasing pointer, will cause the optimizer to realize all bets are off for any type which might contain a struct My_Header. It would be synchronized with access through any pointer type. (e.g. if you had a Foo_Type which was pointing to what was actually a Bar_Type, it could access through the header and reliably find out which it had--assuming that's something the header bits could tell you.)
This relies on the optimizer not getting "smart" and making case two back into case one.
The code bar->header.bits = 1020; is exactly identical to struct My_Header *alias = &bar->header; alias->bits = 1020;.
The strict aliasing rule is defined in terms of access to objects through lvalues:
6.5p7 An object shall have its stored value accessed
only by an lvalue expression that has one of the following
types:
The only things that matter are the type of the lvalue, and the effective type of the object designated by the lvalue. Not whether you stored some intermediate stages of the lvalue's derivation in a pointer variable.
NOTE: The question was edited since the following text was posted. The following text applies to the original question where the space was allocated by malloc, not the current question as of August 23.
Regarding whether the code is correct or not. Your code is equivalent to Q80 effective_type_9.c in N2013 rev 1571, which is a survey of existing C implementations with an eye to drafting improved strict aliasing rules.
Q80. After writing a structure to a malloc’d region, can its members be accessed via a pointer to a different structure type that has the same leaf member type at the same offset?
The stumbling block is whether the code (*bar).header.bits = 1020; sets the effective type of only the int bits; or of the entire *bar. And accordingly, whether reading (*foo).header.bits reads an int, or does it read the entire *foo?
Reading only an int would not be a strict aliasing violation (it's OK to read int as int); but reading a Bar_Struct as Foo_Struct would be a violation.
The authors of this paper consider the write to set the effective type for the entire *bar, although they don't give their justification for that, and I do not see any text in the C Standard to support that position.
It seems to me there's no definitive answer currently for whether or not your code is correct.
The fact that you have two structures which contain My_Header is a red herring and complicates your thinking without bringing anything new to the table. Your problem can be stated and clarified without any struct (other than My_Header ofcourse).
foo->header.bits = 1020;
The compiler clearly knows which object to modify.
struct My_Header *alias = &foo->header;
alias->bits = 1020;
Again the same is true here: with a very rudimentary analysis the compiler knows exactly which object the alias->bits = 1020; modifies.
The interesting part comes here:
void foo(struct My_Header* p)
{
p->bits = 1020;
}
In this function the pointer p can alias any object (or sub-object) of type My_header. It really doesn't matter if you have N structures who contain My_header members or if you have none. Any object of type My_Header could be potentially modified in this function.
E.g.
// global:
struct My_header* global_p;
void foo(struct My_Header* p)
{
p->bits = 1020;
global_p->bits = 15;
return p->bits;
// the compiler can't just return 1020 here because it doesn't know
// if `p` and `global_p` both alias the same object or not.
}
To convince you that the Foo_Type and Bar_Type are red herrings and don't matter look at this example for which the analysis is identical to the previous case who doesn't involve neither Foo_Type nor Bar_type:
// global:
struct My_header* gloabl_p;
void foo(struct Foo_Type* foo)
{
foo->header.bits = 1020;
global_p->bits = 15;
return foo->header.bits;
// the compiler can't just return 1020 here because it doesn't know
// if `foo.header` and `global_p` both alias the same object or not.
}
The way N1570 p5.6p7 is written, the behavior of code that accesses individual members of structures or unions will only be defined if the accesses are performed using lvalues of character types, or by calling library functions like memcpy. Even if a struct or union has a member of type T, the Standard (deliberately IMHO) refrains from giving blanket permission to access that part of the aggregate's storage using seemingly-unrelated lvalues of type T. Presently, gcc and clang seem to grant blanket permission for accessing structs, but not unions, using lvalues of member type, but N1570 p5.6p7 doesn't require that. It applies the same rules to both kinds of aggregates and their members. Because the Standard doesn't grant blanket permission to access structures using unrelated lvalues of member type, and granting such permission impairs useful optimizations, there's no guarantee gcc and clang will continue this behavior with with unrelated lvalues of member types.
Unfortunately, as can be demonstrated using unions, gcc and clang are very poor at recognizing relationships among lvalues of different types, even when one lvalue is quite visibly derived from the other. Given something like:
struct s1 {short x; short y[3]; long z; };
struct s2 {short x; char y[6]; };
union U { struct s1 v1; struct s2 v2; } unionArr[100];
int i;
Nothing in the Standard would distinguish between the "aliasing" behaviors of the following pairs of functions:
int test1(int i)
{
return unionArr[i].v1.x;
}
int test2a(int j)
{
unionArr[j].v2.x = 1;
}
int test2a(int i)
{
struct s1 *p = &unionArr[i].v1;
return p->x;
}
int test2b(int j)
{
struct s2 *p = &unionArr[j].v2;
p->x = 1;
}
Both of them use an lvalue of type int to access the storage associated with objects of type struct s1, struct s2, union U, and union U[100], even though int is not listed as an allowable type for accessing any of those.
While it may seem absurd that even the first form would invoke UB, that shouldn't be a problem if one recognizes support for access patterns beyond those explicitly listed in the Standard as a Quality of Implementation issue. According to the published rationale, the authors of the Standard thought compiler writers would to try to produce high-quality implementations, and it was thus not necessary to forbid "conforming" implementations from being of such low quality as to be useless. An implementation could be "conforming" without being able to handle test1a() or test2b() in cases where they would access member v2.x of a union U, but only in the sense that an implementation could be "conforming" while being incapable of correctly processing anything other than some particular contrived and useless program.
Unfortunately, although I think the authors of the Standard would likely have expected that quality implementations would be able to handle code like test2a()/test2b() as well as test1a()/test1b(), neither gcc nor clang supports them pattern reliably(*). The stated purpose of the aliasing rules is to avoid forcing compilers to allow for aliasing in cases where there's no evidence of it, and where the possibility of aliasing would be "dubious" [doubtful]. I've seen no evidence that they intended that quality compilers wouldn't recognize that code which takes the address of unionArr[i].v1 and uses it is likely to access the same storage as other code that uses unionArr[i] (which is, of course, visibly associated with unionArr[i].v2). The authors of gcc and clang, however, seem to think it's possible for something to be a quality implementation without having to consider such things.
(*) Given e.g.
int test(int i, int j)
{
if (test2a(i))
test2b(j);
return test2a(i);
}
neither gcc nor clang will recognize that if i==j, test2b(j) would access the same storage as test2a(i), even when though both would access the same element of the same array.

Is a redeclaration of an untagged structure a compatible type?

For purposes expressed in this question, we want to do this:
typedef struct { int a; } A;
typedef struct { struct { int a; }; int b; } B;
A *BToA(B *b) { return (A *) b; }
B *AToB(A *a) { return (B *) a; }
The desire is that the casts conform to C 2011 6.7.2.1 15, which says “A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa.”
Since the struct { int a; } inside B does not have a name, let’s call it A'.
“Suitably” is not explicitly defined. I presume that if b is a valid pointer to an object of type A', then (A *) b performs a suitable conversion, and, similarly, if a is a pointer to an A' that is in a B, then (B *) a is a suitable conversion.
So the question is: Is A * a valid pointer to an object of type A'?
Per 6.7.6.1, A * is compatible with A' * if A is compatible with A'.
Per 6.2.7, “Two types have compatible type if their types are the same… Moreover, two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements: If one is declared with a tag, the other shall be declared with the same tag. If both are completed anywhere within their respective translation units, then the following additional requirements apply: there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types; if one member of the pair is declared with an alignment specifier, the other is declared with an equivalent alignment specifier; and if one member of the pair is declared with a name, the other is declared with the same name. For two structures, corresponding members shall be declared in the same order…”
These cannot be the same type by 6.7.2.3 5: “Each declaration of a structure, union, or enumerated type which does not include a tag declares a distinct type.”
Since they are not the same type, are they compatible? The text in 6.2.7 says they are compatible if declared in separate translation units, but these are in the same translation unit.
As you laid out in the question, the standard clearly and unambiguously says that two struct definitions struct { int a; } in the same translation unit declare two incompatible types. Notwithstanding the fact that this might be "weird". Compilers have always followed the standard.
This seems like reasonable behaviour to me: if you happen to have semantically unrelated structs in your project that coincidentally have a member list with the same types, you do want the compiler to reject an accidental assignment between the two.
Re. the code in your question, according to 6.7.2.1/13,
The members of an anonymous structure or union are considered to be members of the containing structure or union.
So I would treat the definition of B as being equivalent to:
typedef struct { int a; int b; } B;
for purposes of further analysis.
I haven't seen anything in the standard that says that both struct are compatible and thus I would say that they are not.
The only thing that could get you a limited compatibility between the structures is the use of an union, as mentioned in 6.7.2.1§6:
One special guarantee is made in order to simplify the use of unions: if a union contains
several structures that share a common initial sequence (see below), and if the union
object currently contains one of these structures, it is permitted to inspect the common
initial part of any of them anywhere that a declaration of the completed type of the union
is visible.
i.e., something like
typedef struct { int a; } A;
typedef struct { union { struct { int a; }; A export; }; int b; } B;
A *BToA(B *b) { return &b->export; }
B *AToB(A *a) { return (B *) a; }
should be safe, but for read accesses only: the standard did not really bother to specify what "inspecting" the common initial sequence means, but seems to use it in opposition to "modify".
There are two situations in which compatibility of structures is relevant:
In deciding whether values or pointers of one type may be coerced into values or pointers of the other, without use of the casting operator and without producing a diagnostic. Note that for this purpose, separately-declared structures are incompatible even if they are structurally identical, but that this kind of compatibility is irrelevant when passing structures or pointers between compilation units.
In deciding whether a value of, or pointer to one type, may be safely given to code which expects the other. Passing untagged structures between compilation units would be impossible if structurally-identical types were not regarded as compatible for this purpose. Compilers used to regard structurally-identical types as compatible for this purpose even within a compilation unit, and there was never any good reason for compilers to do otherwise in cases where one or both types are untagged, but because the Standard doesn't mandate such treatment it has become fashionable for compilers to senselessly weaken the language by blithely assuming that a pointer to one such type won't be used to access members of another.
Unfortunately, when the Standard was written, its authors didn't think it was important to expressly mandate all the obviously-useful things that compilers were already doing, and which sensible compilers would continue to do. The net result is that useful constructs which used to be supported and non-controversial will be unreliable unless otherwise-useful optimizations are disabled.

Clarification on an example of unions in C11 standard

The following example is given in the C11 standard, 6.5.2.3
The following is not a valid fragment (because the union type is not
visible within function f):
struct t1 { int m; };
struct t2 { int m; };
int f(struct t1 *p1, struct t2 *p2)
{
if (p1->m < 0)
p2->m = -p2->m;
return p1->m;
}
int g()
{
union {
struct t1 s1;
struct t2 s2;
} u;
/* ... */
return f(&u.s1, &u.s2);
}
Why does it matter that the union type is visible to the function f?
In reading through the relevant section a couple times, I could not see anything in the containing section disallowing this.
It matters because of 6.5.2.3 paragraph 6 (emphasis added):
One special guarantee is made in order to simplify the use of unions:
if a union contains several structures that share a common initial
sequence (see below), and if the union object currently contains one
of these structures, it is permitted to inspect the common initial
part of any of them anywhere that a declaration of the completed type
of the union is visible. Two structures share a common initial
sequence if corresponding members have compatible types (and, for
bit-fields, the same widths) for a sequence of one or more initial
members.
It's not an error that requires a diagnostic (a syntax error or constraint violation), but the behavior is undefined because the m members of the struct t1 and struct t2 objects occupy the same storage, but because struct t1 and struct t2 are different types the compiler is permitted to assume that they don't -- specifically that changes to p1->m won't affect the value of p2->m. The compiler could, for example, save the value of p1->m in a register on first access, and then not reload it from memory on the second access.
Note: This answer doesn't directly answer your question but I think it is relevant and is too big to go in comments.
I think the example in the code is actually correct. It's true that the union common initial sequence rule doesn't apply; but nor is there any other rule which would make this code incorrect.
The purpose of the common initial sequence rule is to guarantee the same layout of the structs. However that is not even an issue here, as the structs only contain a single int, and structs are not permitted to have initial padding.
Note that , as discussed here, sections in ISO/IEC documents titled Note or Example are "non-normative" which means they do not actually form a part of the specification.
It has been suggested that this code violates the strict aliasing rule. Here is the rule, from C11 6.5/7:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
a type compatible with the effective type of the object,
[...]
In the example, the object being accessed (denoted by p2->m or p1->m) have type int. The lvalue expressions p1->m and p2->m have type int. Since int is compatible with int, there is no violation.
It's true that p2->m means (*p2).m, however this expression does not access *p2. It only accesses the m.
Either of the following would be undefined:
*p1 = *(struct t1 *)p2; // strict aliasing: struct t2 not compatible with struct t1
p2->m = p1->m++; // object modified twice without sequence point
Given the declarations:
union U { int x; } u,*up = &u;
struct S { int x; } s,*sp = &s;
the lvalues u.x, up->x, s.x, and sp->x are all of type int, but any access to any of those lvalues will (at least with the pointers initialized as shown) will also access the stored value of an object of type union U or struct S. Since N1570 6.5p7 only allows objects of those types to be accessed via lvalues whose types are either character types, or other structs or unions that contain objects of type union U and struct S, it would not impose any requirements about the behavior of code that attempts to use any of those lvalues.
I think it's clear that the authors of the Standard intended that compilers allow objects of struct or union types to be accessed using lvalues of member type in at least some circumstances, but not necessarily that they allow arbitrary lvalues of member type to access objects of struct or union types. There is nothing normative to differentiate the circumstances where such accesses should be allowed or disallowed, but there is a footnote to suggest that the purpose of the rule is to indicate when things may or may not alias.
If one interprets the rule as only applying in cases where lvalues are used in ways that alias seemingly-unrelated lvalues of other types, such an interpretation would define the behavior of code like:
struct s1 {int x; float y;};
struct s2 {int x; double y;};
union s1s2 { struct s1 v1; struct s2 v2; };
int get_x(void *p) { return ((struct s1*)p)->x; }
when the latter was passed a struct s1*, struct s2*, or union s1s2* that identifies an object of its type, or the freshly-derived address of either member of union s1s2. In any context where an implementation would see enough to have reason to care about whether operations on the original and derived lvalues would affect each other, it would be able to see the relationship between them.
Note, however, that that such an implementation would not be required to allow for the possibility of aliasing in code like the following:
struct position {double px,py,pz;};
struct velocity {double vx,vy,vz;};
void update_vectors(struct position *pos, struct velocity *vel, int n)
{
for (int i=0; i<n; i++)
{
pos[i].px += vel[i].vx;
pos[i].py += vel[i].vy;
pos[i].pz += vel[i].vz;
}
}
even though the Common Initial Sequence guarantee would seem to allow for that.
There are many differences between the two examples, and thus many indications that a compiler could use to allow for the realistic possibility of the first code is passed a struct s2*, it might accessing a struct s2, without having to allow for the more dubious possibility that operations upon pos[] in the second examine might affect elements of vel[].
Many implementations seeking to usefully support the Common Initial Sequence rule in useful fashion would be able to handle the first even if no union type were declared, and I don't know that the authors of the Standard intended that merely adding a union type declaration should force compilers to allow for the possibility of arbitrary aliasing among common initial sequences of members therein. The most natural intention I can see for mentioning union types would be that compilers which are unable to perceive any of the numerous clues present in the first example could use the presence or absence of any complete union type declaration featuring two types as an indication of whether lvalues of one such type might be used to access another.
Note neither N1570 P6.5p7 nor its predecessors make any effort to describe all cases where quality implementations should behave predictably when given code that uses aggregates. Most such cases are left as Quality of Implementation issues. Since low-quality-but-conforming implementations are allowed to behave nonsensically for almost any reason they see fit, there was no perceived need to complicate the Standard with cases that anyone making a bona fide effort to write a quality implementation would handle whether or not it was required for conformance.

Resources