Union of structs with common first member - c

I am unsure of whether or not the code has pointer aliasing (or other standard conformance issues) in the asserts cast. It seems that a pointer to the union type should be able to be cast to a pointer of the first member and since the union is only composed of these two structs, I think a cast to the first member should work, but I'm not sure if this is correct or if I'm glossing over padding details in the process. Are unions required to pad the upper bits?
It seems as this is unspecified behavior? Does anyone have any insight as to whether this is suported. I know that there is an alternative standard way of doing this by using a struct with a enum type field and struct container_storage member, but it seems like a waste of space considering that this information is already in struct contained
compilation command in linux: gcc -std=c99 -Wextra -pedantic -fstrict-aliasing test.c && ./a.out && echo $? returns 0
#include <stdlib.h>
#include <assert.h>
enum type {type_a = 1, type_b = 2};
struct contained {
int some_other_field;
enum type type;
};
struct container_a {
struct contained contained;
int test;
};
struct container_b {
struct contained contained;
char test;
};
union container_storage {
struct container_a container_a;
struct container_b container_b;
};
int
main(int argc, char **argv)
{
union container_storage a =
{.container_a = {.contained = {.type = type_a}, .test = 42}};
union container_storage b =
{.container_b = {.contained = {.type = type_b}, .test = 'b'}};
assert(((struct contained *)&a)->type == type_a);
assert(((struct contained *)&b)->type == type_b);
return EXIT_SUCCESS;
}
References:
[1] gcc, strict-aliasing, and casting through a union
[2] What is the strict aliasing rule?

That should be fine. C11, 6.5.2.3/6 ("Structure and union members") says:
One special guarantee is made in order to simplify the use of unions: if a union contains
several structures that share a common initial sequence (see below), and if the union
object currently contains one of these structures, it is permitted to inspect the common
initial part of any of them anywhere that a declaration of the completed type of the union
is visible. Two structures share a common initial sequence if corresponding members
have compatible types (and, for bit-fields, the same widths) for a sequence of one or more
initial members.
(C++ makes the same guarantee (C++11, 9.2/18) for standard-layout unions.)

union don't pad, they just overlay their members. The first member of any struct is guaranteed to start right off, without padding. In general struct that start with the same members of same type are guaranteed to have the same layout for that initial part.

Under C89, a pointer of structure type which identifies a member of a union may be used to inspect any member which is part of a Common Initial Sequence shared with the type of data stored therein. This in turn generally implies that a pointer to any structure type could be used to inspect any member of the Common Initial Sequence shared with any other type (such behavior would have been unambiguously defined if the object happened to be a member of a declared union object, and the only practical way for a compiler to yield the required behavior in those cases would be to uphold it for all).
C99 added an additional requirement that the CIS guarantees only apply when a complete union type containing both structures is visible, which some compiler writers seem to think means it only applies to accesses performed directly through union types. The authors of such compilers seem to think a function that would need to handle functions with a common header like:
struct smallThing { void *next; uint16_t length; uint8_t dat[2]; };
struct bigThing { void *next; uint16_t length; uint8_t dat[65528]; };
should be to extract out the header like:
struct uHeader { void *next; uint16_t length; };
struct smallThing { uHeader head; uint8_t dat[2]; };
struct bigThing { uHeader head; uint8_t dat[15994]; };
or use union-type objects for everything, even though using uHeader would
increase the size of struct smallThing by 50% (and totally break any code that
had been reliant upon its layout), and using unions for everything when most objects only need to be small would increase memory usage a thousandfold.
If one needs code to be compatible with compilers that essentially ignore the Common Initial Sequence rule, one should regard the Common Initial Sequence rule as essentially useless. Personally, I think it would be better to document that only compilers that honor the CIS should be considered suitable for use with one's code, rather than bending over backward to accommodate unsuitable compilers, but I think it's important to be aware that compilers like the latter ones exist.
So far as I can tell, clang and gcc do not honor the CIS rule in any useful way except when the -fno-strict-aliasing flag is set. I don't know about other compilers.

Related

C: Is accessing initial member of nested struct using pointer cast to "outer" struct type defined? [duplicate]

This question already has answers here:
Are C-structs with the same members types guaranteed to have the same layout in memory?
(4 answers)
Closed 1 year ago.
I'm trying to understand the so-called "common initial sequence" rule for C aliasing analysis. This question does not concern C++.
Specifically, according to resources (for example the CPython PEP 3123),
[A] value of a struct type may also be accessed through a pointer to the first field. E.g. if a struct starts with an int, the struct * may also be cast to an int *, allowing to write int values into the first field.
(emphasis mine).
My question can be roughly phrased as "does the ability to access a struct by pointer to first-member-type pierce nested structs?" That is, what happens if access is via a pointer whose pointed-to type (let's say type struct A) isn't exactly the same type as that of the first member (let's say type struct B), but that pointed-to type (struct A) has common first initial sequence with struct B, and the "underlying" access is only done to that common initial sequence?
(I'm chiefly interested in structs, but I can imagine this question may also pertain to unions, although I imagine unions come with their own tricky bits w.r.t. aliasing.)
This phrasing may not clear, so I tried to illustrate my intention with the code as follows (also available at godbolt.org, and the code seem to compile just fine with the intended effect):
/* Base object as first member of extension types. */
struct base {
unsigned int flags;
};
/* Types extending the "base" by including it as first member */
struct file_object {
struct base attr;
int index;
unsigned int size;
};
struct socket_object {
struct base attr;
int id;
int type;
int status;
};
/* Another base-type with an additional member, but the first member is
* compatible with that of "struct base" */
struct extended_base {
unsigned int flags;
unsigned int mode;
};
/* A type that derives from extended_base */
struct extended_socket_object {
struct extended_base e_attr; /* Using "extended" base here */
int e_id;
int e_type;
int e_status;
int some_other_field;
};
/* Function intended for structs "deriving from struct base" */
unsigned int set_flag(struct base *objattr, unsigned int flag)
{
objattr->flags |= flag;
return objattr->flags;
}
extern struct file_object *file;
extern struct socket_object *sock;
extern struct extended_socket_object *esock;
void access_files(void)
{
/* Cast to pointer-to-first-member-type and use it */
set_flag((struct base *)file, 1);
set_flag((struct base *)sock, 1);
/* Question: is the following access defined?
* Notice that it's cast to (struct base *), rather than
* (struct extended_base *), although the two structs share the same common
* initial member and it is this member that's actually accessed. */
set_flag((struct base *)esock, 1);
return;
}
This is not safe as you're attempting to access an object of type struct extended_base as though it were an object of type struct base.
However, there are rules that allow access to two structures initial common sequence via a union. From section 6.5.2.3p6 of the C standard:
One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members
So if you change the definition of struct extended_socket_object to this:
struct extended_socket_object {
union u_base {
struct base b_attr;
struct extended_base e_attr;
};
int e_id;
int e_type;
int e_status;
int some_other_field;
};
Then a struct extended_socket_object * may be converted to union u_base * which may in turn be converted to a struct base *. This is allowed as per section 6.7.2.1 p15 and p16:
15 Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase
in the order in which they are declared. A pointer to a structure
object, suitably converted, points to its initial member (or
if that member is a bit-field, then to the unit in which it
resides), and vice versa. There may be unnamed padding within
a structure object, but not at its beginning.
16 The size of a union is sufficient to contain the largest of its members. The value of at most one of the
members can be stored in a union object at any time. A
pointer to a union object, suitably converted, points to each
of its members (or if a member is a bit-field, then to the
unit in which it resides), and vice versa.
It is then allowed to access b_attr->flags because of the union it resides in via 6.5.2.3p6.
According to the C Standard (6.7.2.1 Structure and union specifiers, paragraph 13):
A pointer to a structure object, suitably converted, points to its
initial member (or if that member is a bit-field, then to the unit in
which it resides), and vice versa.
So, converting esock to struct extended_base * and then converting it to unsigned int * must give us a pointer to the flags field, according to the Standard.
I'm not sure if converting to to struct base * counts as "suitably converted" or not. My guess is that it would work at any machine you will try it on, but I wouldn't recommend it.
I think it would be safest (and also make the code more clear) if you simply keep a member of type struct base inside struct extended_base (instead of the member of type unsigned int). After doing that, you have two options:
When you want to send it to a function, write explicitly: esock->e_attr.base (instead of (struct base *)esock). This is what I would recommend.
You can also write: (struct base *) (struct extended_base *) esock which is guaranteed to work, but I think it is less clear, and also more dangerous (if in the future you will want to add or accidentaly add another member in the beginning of the struct).
After reading up into the standard's text following the other answers (thanks!!) I think I may try to answer my own question (which was a bit misleading to begin with, see below)
As the other answers pointed out, there appear to be two somewhat overlapping concerns in this question -
"common initial sequence" -- in the standard documents this specifically refers to the context of a union having several structs as member and when these member structs share some compatible members beginning from the first. (§6.5.2.3 " Structure and union members", p6 -- Thanks, #dbush!).
My reading: the language spec suggests that, if at the site of access to these "apparently" different structs it is made clear that they actually belong to the same union, and that the access is done through the union, it is permitted; otherwise, it is not.
I think the requirement is meant to work with type-based aliasing rules: if these structs do indeed alias each other, this fact must be made clear at compile time (by involving the union). When the compiler sees pointers to different types of structs, it can't, in the most general case, deduce whether they may have belonged to some union somewhere. In that case, if it invokes type-based alias analysis, the code will be miscompiled. So the standard requires that the union is made visible.
"a pointer (to struct), when suitably converted, points to its initial member" (§6.7.2.1 "Structure and union specifiers", p15) -- this sounds tantalizingly close to 1., but it's less about aliasing than about a) the implementation requirements for struct and b) "suitable conversion" of pointers. (Thanks, #Orielno!)
My reading: the "suitable conversion" appears to mean "see everything else in the standard", that is, no matter if the "conversion" is performed by type cast or assignment (or a series of them), being "suitable" suggests "all constraints must be satisfied at all steps". The "initial-member" rule, I think, simply says that the actual location of the struct is exactly the same as the initial member: there cannot be padding in front of the first member (this is explicitly stated in the same paragraph).
But no matter how we make use of this fact to convert pointers, the code must still be subject to constraints governing conversion, because a pointer is not just a machine representation of some location -- its value still has to be correctly interpreted in the context of types. A counterexample would be a conversion involving an assignment that discards const from the pointed-to type: this violates a constraint and cannot be suitable.
The somewhat misleading thing in my original post was to suggest that rule 2 had something to do with "common initial sequence", where it is not directly related to that concept.
So for my own question, I tend to answer, to my own surprise, "yes, it is valid". The reason is that the pointer conversion by cast in expression (struct base *)esock is "legal in the letter of the law" -- the standard simply says that (§6.5.4 "Cast operators", p3)
Conversions that involve pointers, other than where permitted by the constraints of 6.5.16.1 (note: constraints governing simple assignment), shall be specified by means of an explicit cast.
Since the expression is indeed an explicit cast, in and by itself it doesn't contradict the standard. The "conversion" is "suitable". Further function call to set_flag() correctly dereferences the pointer by virtue of the suitable conversion.
But! Indeed the "common initial sequence" becomes important when we want to improve the code. For example, in #dbush's answer, if we want to "inherit from multiple bases" via union, we must make sure that access to base is done where it's apparent that the struct is a member of the union. Also, as #Orielno pointed out, when the code makes us worry about its validity, perhaps switching to an explicitly safe alternative is better even if the code is valid in the first place.
In the language the C Standard was written to describe, an lvalue of the form ptr->memberName would use ptr's type to select a namespace in which to look up memberName, add the offset of that member to the address in ptr, and then access an object of that member type at that address. Once the address and type of the member were determined, the original structure object would play no further rule in the processing of the expression.
When C99 was being written, there was a desire to avoid requiring that a compiler given something like:
struct position {double x,y,z; };
struct velocity {double dx,dy,dz; };
void update_positions(struct positions *pp, struct velocity *vv, int count)
{
for (int i=0; i<count; i++)
{
positions[i].x += vv->dx;
positions[i].y += vv->dy;
positions[i].z += vv->dz;
}
}
must allow for the possibility that a write to e.g. positions[i].y might affect the object of vv->dy even when there is no evidence of any relationship between any object of type struct position and any object of type struct velocity. The Committee agreed that compilers shouldn't be required to accommodate interactions between different structure types in such cases.
I don't think anyone would have seriously disputed the notion that in situations where storage is accessed using a pointer which is freshly and visibly converted from one structure type to another, a quality compiler should accommodate the possibility that the operation might access a structure of the original type. The question of exactly when an implementation would accommodate such possibilities should depend upon what its customers were expecting to do, and was thus left as a quality-of-implementation issue outside the Standard's jurisdiction. The Standard wouldn't forbid implementations from being willfully blind to even the most obvious cases, but that's because the dumber something would be, the less need there should be to prohibit it.
Unfortunately, the authors of clang and gcc have misinterpreted the Standard's failure to forbid them from being obtusely blind to the possibility that a freshly-type-converted pointer might be used to access the same object as a pointer of the original type, as an invitation to behave in such fashion. When using clang or gcc to process any code which would need to make use of the Common Initial Sequence guarantees, one must use -fno-strict-aliasing. When using optimization without that flag, both clang nor gcc are prone to behave in ways inconsistent with any plausible interpretation of the Standard's intent. Whether one views such behaviors as being a result of a really weird interpretation of the Standard, or simply as bugs, I see no reason to expect that gcc or clang will ever behave meaningfully in such cases.

Struct pointer casts

I'm trying to implement a linked list like this:
typedef struct SLnode
{
void* item;
void* next;
} SLnode;
typedef struct DLnode
{
void* item;
void* next;
struct DLnode* prev;
} DLnode;
typedef struct LinkedList
{
void* head; /*SLnode if doubly_linked is false, otherwise DLnode*/
void* tail; /* here too */
bool doubly_linked;
} LinkedList;
And I want to access it like this:
void* llnode_at(const LinkedList* ll, size_t index)
{
size_t i;
SLnode* current;
current = ll->head;
for(i = 0; i < index; i++)
{
current = current->next;
}
return current;
}
So my question is:
Am I allowed to cast between these structs as long as I only access the common members? I read differing opinions on this.
Could I also make the next-pointer of the respective types? Or would it be UB then to use it in my example function in case it really is DLnode?
In case this doesn't work, are there any other ways of doing something like this? I read that unions might work, but this code should also run in C89, and afaik reading a different union member than last written to is UB there.
So you are trying to build subclasses in C. A possible way is to make the base struct to be the first element of the child struct, because in that case C standard explicitely allows casting back and forth between those 2 types:
6.7.2.1 Structure and union specifiers
§ 13 ... A pointer to a
structure object, suitably converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa...
The downside is that you need a cast to the base class to access its members:
Example code:
typedef struct SLnode
{
void* item;
void* next;
} SLnode;
typedef struct DLnode
{
struct SLnode base;
struct DLnode* prev;
} DLnode;
You can then use it that way:
DLnode *node = malloc(sizeof(DLnode));
((SLnode*) node)->next = NULL; // or node->base.next = NULL
((SLnode *)node)->item = val;
node->prev = NULL;
You can do this safely provided you use a union to contain the two structures:
union Lnode {
struct SLnode slnode;
struct DLnode dlnode;
};
Section 6.5.2.3 of the current C standard, as well as section 6.3.2.3 of the C89 standard, states the following:
6 One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common
initial sequence (see below), and if the union object currently
contains one of these structures, it is permitted to inspect the
common initial part of any of them anywhere that a declaration of the
completed type of the union is visible. Two structures share a common
initial sequence if corresponding members have compatible types (and,
for bit-fields, the same widths) for a sequence of one or more initial
members.
Because the first two members of both structures are of the same type, you can freely access those members using either union member.
What you describe should be allowed under the C Standard. The confusion of the Common Initial Sequence rule stems from a bigger problem: the Standard fails to specify when the use of a pointer or lvalue which is visibly derived from another is considered to have be a use of the original. If the answer is "never", then any struct or union member of a non-character type would be pretty much useless, since the member would be an lvalue whose type isn't valid for accessing the struct or union. Such a view would clearly be absurd. If the answer is "only when it is formed by directly applying "." or "->" on the struct or union type, or a pointer to such a type, that would make the ability to use "&" on struct and union members rather useless. I'd regard that view as only slightly less absurd.
I think it's clear that in order to be useful the C language must be viewed as allowing derived lvalues to be used in at least some circumstances. Whether your code, or most code relying upon the Common Initial Sequence rule, is usable depends upon what those circumstances are.
The language would be rather silly if code couldn't reliably use derived lvalues to access structure members. Unfortunately, even though this problem was apparent in 1992 (it forms the underlying basis of Defect Report #028, published in that year) the Committee didn't address the fundamental issue but instead reached a correct conclusion based upon totally nonsensical logic, and has since gone and added needless complexity in the form of "Effective Types" without ever bothering to actually define the behavior of someStruct.member.
Consequently, there is no way to write any code which does much of anything with structs or unions without relying upon more behaviors than would actually be guaranteed by a literal reading of the Standard, whether such accesses are done by coercing void* or pointers to proper member types.
If one reads the intention of 6.5p7 as being to somehow allow actions which use an lvalue which is derived from one of a particular type to access objects of that type, at least in cases that don't involve actual aliasing (note a huge stretch, given footnote #88 "The intent of this list is to specify those circumstances in which an object may or may not be aliased."), and recognizes that aliasing requires that a region of storage be accessed using a reference X at a time when there exists another reference from which X was not visibly derived that will in future be used to access the storage in conflicting fashion, then compilers that honor that intention should be able to handle code like yours without difficulty.
Unfortunately, both gcc and clang seem to interpret p6.5p7 as saying that an lvalue which is derived from one of another type should often be presumed incapable of actually identifying objects of that former type even in cases where the derivation is fully visible.
Given something like:
struct s1 {int x;};
struct s2 {int x;};
union u {struct s1 v1; struct s2 v2;};
int test(union u arr[], int i1, int i2)
{
struct s1 *p1 = &arr[i1].v1;
if (p1->x)
{
struct s2 *p2 = &arr[i2].v2;
p2->x=23;
}
struct s1 *p3 = &arr[i1].v1;
return p3->x;
}
At the time p1->x is accessed, p1 is clearly derived from an lvalue of union type, and should thus be capable of accessing such an object, and the only other existing references that will ever be used to access the storage are references to that union type. Likewise when p2->x and p3->x are accessed. Unfortunately, both gcc and clang interpret N1570 6.5p7 as an indication that they should ignore the relationships between the union and the pointers to its members. If gcc and clang can't be relied upon to usefully allow code like the above to access the Common Initial Sequence of identical structures, I wouldn't trust them to reliably handle structures like yours either.
Unless or until the Standard is corrected to say under what cases a derived lvalue may be used to access a member of a struct or union, it's unclear that any code that does anything remotely unusual with structures or unions should be particularly expected to work under the -fstrict-aliasing dialects of gcc and clang. On the other hand, if one recognizes the concept of lvalue derivation as working both ways, a compiler might be justified in assuming that a pointer which is of one structure type won't be used in ways that would alias a reference to another, even if the pointer is cast to the second type before use. I'd therefore suggest that using void* would be less likely to run into trouble if the Standard ever fixes the rules.

Initialization of anonymous struct, workaround for gcc 4.9

I have the following struct types:
typedef struct PG_Point PG_Point;
struct PG_Point
{
int x;
int y;
};
typedef struct PG_Size PG_Size;
struct PG_Size
{
int width;
int height;
};
typedef struct PG_Bounds PG_Bounds;
struct PG_Bounds
{
union
{
struct
{
PG_Point topLeft;
PG_Size size;
};
struct
{
struct
{
int x;
int y;
};
struct
{
int width;
int height;
};
};
};
};
with the following initializers:
#define PG_Point_init(ix, iy) {.x=(ix), .y=(iy)}
#define PG_Size_init(iwidth, iheight) {.width=(iwidth), .height=(iheight)}
#define PG_Bounds_init(ix, iy, iwidth, iheight) { \
.topLeft=PG_Point_init((ix),(iy)), \
.size=PG_Size_init((iwidth),(iheight)) }
From what I understand, it's correct in c11 to initialize the fields of an anonymous struct as if they were directly fields of the containing struct? But with gcc 4.9.2, this gives the following warning:
warning: missing initializer for field ‘size’ of ‘struct <anonymous>’ [-Wmissing-field-initializers]
It works if I change the initializer to this version:
#define PG_Bounds_init(ix, iy, iwidth, iheight) {{{ \
.topLeft=PG_Point_init((ix),(iy)), \
.size=PG_Size_init((iwidth),(iheight)) }}}
That is, explicitly having the union and struct as sub aggregates.
Is this even allowed? Do I have to expect other compilers to reject this?
From what I understand, it's correct in c11 to initialize the fields of an anonymous struct as if they were directly fields of the containing struct?
There are two parts to that. First of all, we need to tackle the question of whether such members can be initialized at all, because Paragraph 6.7.2.1/13 identifies anonymous structure and union members as specific kinds of "unnamed members", and paragraph 6.7.9/9 says
Except where explicitly stated otherwise, for the purposes of this subclause unnamed members of objects of structure and union type do not participate in initialization.
The rest of section 6.7.9 (Initialization) nowhere says anything that I would interpret as explicitly applying to anonymous structure and anonymous union members themselves, but I don't think the intent is to prevent initialization of the named members of anonymous members, especially given that they are considered members of the containing structure or union (see below). Thus, I do not interpret the standard to forbid the initialization you are trying to perform.
So yes, I read C11 to allow your initializer and to specify that it has the effect you appear to intend. In particular, paragraph 6.7.2.1/13 of the standard says, in part,
The members of an anonymous structure or union are considered to be members of the containing structure or union. This applies recursively if the containing structure or union is also anonymous.
Your initializer therefore satisfies the constraint in paragraph 6.7.9/7, that the designators within specify names of members of the current object (in your case, a struct PG_Bounds). The following paragraphs of section 6.7.9 present the semantics for initializers, and I see no reason to interpret them to specify anything other than initialization of the overall object with the values you have provided.
At this point, I reiterate that gcc is issuing a warning, not rejecting your code, and in this case I think the warning is spurious. I wrote a test program such as I suggested in comments that you do, and tried it on gcc 4.8.5 in C11 mode. Although gcc emitted the same warning you presented (but only with -Wextra enabled), I was able to demonstrate that your initializer initialized all members of a subject struct PG_Bounds to the intended values.
You also observe that gcc does not warn if you change the initializer to a version that uses nested brace-enclosed initializers, and ask
Is this even allowed? Do I have to expect other compilers to reject this?
This could be viewed as more problematic with respect to paragraph 6.7.9/9, so in that sense it is perhaps riskier. I am uncertain whether there is any compiler that actually rejects it or does the wrong thing with it. I think the intent of the standard is to allow this initializer, but I would prefer the other form, myself.

Alignment of compound type objects in C90 and C99

Please consider the following types:
typedef struct { char myArray[300]; } MyStruct;
typedef union { char myArray[300]; } MyUnion;
typedef struct { uint64_t x; } MyStruct2;
typedef union { uint64_t x; } MyUnion2;
typedef struct { uint64_t x; char myArray[300]; } MyStruct3;
typedef union { uint64_t x; char myArray[300]; } MyUnion3;
I could find information about alignment and padding of members of compound types but I am not sure about objects of those type itself.
What alignment rules apply to objects of these types in RAM using C90 and C99 on X86 platform? Can the alignment change e.g. because the optimizer removes unused members (especially in unions)?
Alignment of objects is implementation defined. You should use compiler specific attributes if you want to set a specific value.
The compiler cannot reasonably be sure a member is unused, hence will not remove "unused" members from unions or structs (actually, unions are another matter because what would "unused" mean?).
The only situation I can think of when the compiler can be sure, is when only static or automatic variables are created in a compilation unit of a struct, which are never passed to a function outside the compilation unit and one or more members are never used in statements. And probably I forgot something that defeats this reasoning.
I think that in all other cases the compiler cannot be sure a member is not used. For example, if it is passed to a function in another compilation unit the compiler cannot change the definition because the other function will rely on the definition and might be using members not used in this compilation unit.
For a union the compiler can never be sure because the memory of the members is shared.

Typechecking in const anonymous union

First off, typechecking is not exactly the correct term I'm looking for, so I'll explain:
Say I want to use an anonymous union, I make the union declaration in the struct const, so after initialization the values will not change. This should allow for statically checking whether the uninitialized member of the union is being accessed. In the below example the instances are initialized for either a (int) or b (float). After this initialization I would like to not be able to access the other member:
struct Test{
const union{
const int a;
const float b;
};
};
int main(){
struct Test intContainer = { .a=5 };
struct Test floatContainer = { .b=3.0 };
int validInt = intContainer.a;
int validFloat = floatContainer.b;
// For these, it could be statically determined that these values are not in use (therefore invalid access)
int invalidInt = floatContainer.a;
float invalidFloat = intContainer.b;
return 0;
}
I'd hope to have the last two assignments to give an error (or at least a warning), but it gives none (using gcc 4.9.2). Is C designed to not check for this, or is it actually a shortcoming of the language/compiler? Or is it just plain stupid to want to use such a pattern?
In my eyes it looks like it has a lot of potential if this was a feature, so can someone explain to me why I can't use this as a way to differentiate between two "sub-types" of a same struct (one for each union value). (Potentially any suggestions how I can still do something like this?)
EDIT:
So apparently it is not in the language standard, and also compilers don't check it. Still I personally think it would be a good feature to have, since it's just eliminating manually checking for the union's contents using tagged unions. So I wonder, does anyone have an idea why it is not featured in the language (or it's compilers)?
I'd hope to have the last two assignments to give an error (or at least a warning), but it gives none (using gcc 4.9.2). Is C designed to not check for this, or is it actually a shortcoming of the language/compiler?
This is a correct behavior of the compiler.
float invalidInt = floatContainer.a;
float invalidFloat = intContainer.b;
In the first declaration you are initializing a float object with an int value and in the second you are initializing a float object with a float value. In C you can assign (or initialize) any arithmetic types to any arithmetic types without any cast required. So no diagnostic required.
In your specific case you are also reading union members that are not the same members as the union member last used to store its value. Assuming the union members are of the same size (e.g., float and int here), this is a specified behavior and no diagnostic is required. If the size of union members are different, the behavior is unspecified (but still, no diagnostic required).

Resources