Quick simple question;
Does this
typedef struct {int a; int b;} S1;
typedef struct {int a;} S2;
((S2*)(POINTER_TO_AN_S1))->a=1;
Always return (and assign) the member a of the structure? Or is it undefined behavior?
In a conforming compiler, if both structure types appear within the complete definition of a union type which is visible where the structure is accessed, and if the target of the pointer happened to be an instance of that union type, behavior would be defined. Note that the Standard does not require that the compiler have any way of knowing that the target of the pointer is actually an object of that union type--merely that the declaration of the complete union type be visible.
Note, however, that gcc does not abide by the Standard here, unless the -fno-strict-aliasing flag is used. Even in cases where the complete union type is visible, and a compiler can see that it is in fact working with objects of the union type, gcc ignores the aliasing. For example, given:
struct s1 {int x;};
struct s2 {int x;};
union u { struct s1 s1; struct s2 s2;};
int read_s1_x(struct s1 *p) { return p->x; }
int read_s2_x(struct s2 *p) { return p->x; }
int write_s1_x(struct s1 *p, int value) { p->x = value; }
int write_s2_x(struct s2 *p, int value) { p->x = value; }
int test(union u *u1, union u *u2)
{
write_s2_x(&u2->s2, 0);
if (!read_s1_x(&u1->s1))
write_s2_x(&u2->s2, 1);
return read_s1_x(&u1->s1);
}
a compiler will decide that it no doesn't need to re-read the value of
u1->s1.x after it writes u2->s2.x, even though the complete union type
is visible and even though a compiler can see that both u1 and u2 are
pointers to objects of the union type. I'm not quite sure what the
authors of gcc think the address-of operator is supposed to mean when
applied to a union type if the resulting pointer can't even be used to
immediately access an object of that member type.
Related
I have read a lot about type punning and how it is not good to just use a cast.
oldType* data = malloc(sizeof(oldType));
((newtype*)data)->newElement;
This results in undefined behavior. So the solution is to use union so that the compiler knows that these two pointers are linked to one another so it doesn't do funny things with strict aliasing.
That being said the unions also looked like:
union testing
{
struct test1 e;
struct test2 f;
}
Is it defined behavior if pointers are used in the union?
union testing
{
struct test1* e;
struct test2* f;
}
Here is a full example:
#include <stdio.h>
#include <stdlib.h>
struct test1
{
int a;
char b;
};
struct test2
{
int c;
char d;
};
union testing
{
struct test1* e;
struct test2* f;
};
void printer(const struct test2* value);
int main()
{
struct test1* aQuickTest = malloc(sizeof(struct test1));
aQuickTest->a = 42;
aQuickTest->b = 'a';
printer(((union testing)aQuickTest).f);
((union testing)aQuickTest.f)->c = 111; // using -> not .
return 0;
}
void printer(const struct test2* value)
{
printf("Int: %i Char: %c",value->c, value->d);
}
Or would I need to use unions without pointers. And then use printer(&(((union testing)aQuickTest).f)); (with the &) to get the address of f.
It is non-conforming to cast to a union type, as your code does:
printer(((union testing)aQuickTest).f);
For that reason, your code does have undefined behavior as far as the Standard is concerned.
More directly to the point, however, no, your approach of putting pointers into a union does not avoid strict aliasing violations with respect to the pointed-to types, even without the casting issue. In your case, the effect is that where your union testing is in scope, implementations cannot assume that objects of type struct test1 ** and struct test2 ** do not alias each other. That does not prevent undefined defined behavior resulting from accessing an object with effective type struct test1 through an lvalue of type struct test2.
Suppose you want to type pun types X and Y, you should use the union -
typedef union {
X x;
Y y;
}X_Y;
This allows you to share the bit representation of X with Y and vice versa.
If you use -
typedef union {
X* x;
Y* y;
}X_Y_p;
you are sharing the bit representations for the pointer. For a system that uses the same bit representation for all pointer, you are essentially casting pointer of X to pointer of Y, which you identified causes Undefined Behaviour.
It is not illegal to have something X_Y_p because X* and Y* are types by themselves. But they achieve something different. They let you type pun pointers, which is not what you want to do (and not necessary in most cases, because pointers share representation on most systems). A cast should be fine there.
I have heard conflicting things about the extent to which the C standards guarantee structure layout consistency. Arguments for a limited extent have mentioned strict aliasing rules. For example, compare these two answers: https://stackoverflow.com/a/3766251/1306666 and https://stackoverflow.com/a/3766967/1306666.
In the following code I assume in all structures foo, bar, and struct { char *id; } that char *id is in the same place, making it safe to cast between them if it is the only member accessed.
Regardless of whether the cast will ever result in an error, does it violate strict aliasing rules?
#include <string.h>
struct foo {
char *id;
int a;
};
struct bar {
char *id;
int x, y, z;
};
struct list {
struct list *next;
union {
struct foo *foop;
struct bar *barp;
void *either;
} ptr;
};
struct list *find_id(struct list *l, char *key)
{
while (l != NULL) {
/* cast to anonymous struct and dereferenced */
if (!strcmp(((struct { char *id; } *)(l->ptr.either))->id, key))
return l;
l = l->next;
}
return NULL;
}
gcc -o /dev/null -Wstrict-aliasing test.c
Note gcc gives no errors.
Yes, there are multiple aliasing-related issues in your program. The use of the lvalue with anonymous structure type, which does not match the type of the underlying object, results in undefined behavior. It could be fixed with something like:
*(char**)((char *)either + offsetof(struct { ... char *id; ... }, id))
if you know the id member is at the same offset in all of them (e.g. they all share same prefix). But in your specific case where it's the first member you can just do:
*(char**)either
because it's always valid to convert a pointer to a struct to a pointer to its first member (and back).
A separate issue is that your use of the union is wrong. The biggest issue is that it assumes struct foo *, struct bar *, and void * all have the same size and representation, which is not guaranteed. Also, it's arguably undefined to access a member of the union other than the one which was previously stored, but as a result of interpretations in defect reports, it's probably safe to say it's equivalent to a "reinterpret cast". But that gets you back to the issue of wrongly assuming same size/representation.
You should just remove the union, use a void * member, and convert the value (rather than reinterpret the bits) to the right pointer type to access the pointed-to structure (struct foo * or struct bar *) or its initial id field (char *).
I've been trying to work out how legal the below is and I could really use some help.
#include <stdio.h>
#include <stdlib.h>
typedef struct foo {
int foo;
int bar;
} foo;
void make_foo(void * p)
{
foo * this = (foo *)p;
this->foo = 0;
this->bar = 1;
}
typedef struct more_foo {
int foo;
int bar;
int more;
} more_foo;
void make_more_foo(void * p)
{
make_foo(p);
more_foo * this = (more_foo *)p;
this->more = 2;
}
int main(void)
{
more_foo * mf = malloc(sizeof(more_foo));
make_more_foo(mf);
printf("%d %d %d\n", mf->foo, mf->bar, mf->more);
return 0;
}
As far as I've gathered, doing this is type punning and is supposed to violate the strict aliasing rule. Does it, though? The pointers passed around are void. You are allowed to interpret a void pointer any way you wish, correct?
Also, I read that there may be memory alignment issues. But struct alignment is deterministic. If the initial members are the same, then they'll get aligned the same way, and there should be no problems accessing all foo members from a more_foo pointer. Is that correct?
GCC compiles with -Wall without warnings, the program runs as expected. However, I'm not sure if it's UB or not and why.
I also saw that this:
typedef union baz {
struct foo f;
struct more_foo mf;
} baz;
void some_func(void)
{
baz b;
more_foo * mf = &b.mf; // or more_foo * mf = (more_foo *)&b;
make_more_foo(mf);
printf("%d %d %d\n", mf->foo, mf->bar, mf->more);
}
seems to be allowed. Because of the polymorphic nature of unions the compiler would be ok with it. Is that correct? Does that mean that by compiling with strict aliasing off you don't have to use an union and can use only structs instead?
Edit: union baz now compiles.
The authors of the Standard didn't think it necessary to specify any means by which an lvalue of a struct or union's member type may be used to access the underlying struct or union. The way N1570 6.5p7 is written doesn't even allow for someStruct.member = 4; unless member if of character type. Being able to apply the & operator to struct and union members wouldn't make any sense, however, unless the authors of the Standard expected that the resulting pointers would be useful for something. Given footnote 88: "The intent of this list is to specify those circumstances in which an object may or may not be aliased", the most logical expectation is that it was only intended to apply in cases where lvalues' useful lifetimes would overlap in ways that would involve aliasing.
Consider the two functions within the code below:
struct s1 {int x;};
struct s2 {int x;};
union {struct s1 v1; struct s2 v2;} arr[10];
void test1(int i, int j)
{
int result;
{ struct s1 *p1 = &arr[i].v1; result = p1->x; }
if (result)
{ struct s2 *p2 = &arr[j].v2; p2->x = 2; }
{ struct s1 *p3 = &arr[i].v1; result = p3->x; }
return result;
}
void test2(int i, int j)
{
int result;
struct s1 *p1 = &arr[i].v1; result = p1->x;
if (result)
{ struct s2 *p2 = &arr[j].v2; p2->x = 2; }
result = p1->x; }
return result;
}
In the test1, even if i==j, all pointer that will ever be accessed during p1's lifetime will be accessed through p1, so p1 won't alias anything. Likewise with p2 and p3. Thus, since there is no aliasing, there should be no problem if i==j. In test2, however, if i==j, then the creation of p1 and the last use of it to access p1->x would be separated by another action which access that storage with a pointer not derived from p1. Consequently, if i==j, then the access via p2 would alias p1, and per N1570 5.6p7 a compiler would not be required to allow for that possibility.
If the rules of 5.6p7 are applicable even in cases that don't involve actual aliasing, then structures and unions would be pretty useless. If they only apply in cases that do involve actual aliasing, then a lot of needless complexity like the "Effective Type" rules could be done away with. Unfortunately, some compilers like gcc and clang use the rules to justify "optimizing" the first function above and then assuming that they don't have to worry about the resulting alias which is present in their "optimized" version but wasn't in the original.
Your code will work fine in any compiler whose authors make any effort to recognize derived lvalues. Both gcc and clang, however, will botch even the test1() function above unless they are invoked with the -fno-strict-aliasing flag. Given that the Standard doesn't even allow for someStruct.member = 4;, I'd suggest that you refrain from the kind of aliasing seen in test2() above and not bother targeting compilers that can't even handle test1().
I'd say it isn't strict since if you change "foo" structure, "more foo" structure will have to change with it . "foo" must become the base of "more foo", this is inheritance, not quite polymorphism. But you can use function pointers to introduce polymorphism to help with these structures.
Example
#include <stdio.h>
#include <stdlib.h>
#define NEW(x) (x*)malloc(sizeof(x));
typedef struct
{
void(*printme)(void*);
int _foo;
int bar;
} foo;
typedef struct
{
// inherits foo
foo base;
int more;
} more_foo;
void foo_print(void *t)
{
foo *this = (foo*)t;
printf("[foo]\r\n\tfoo=%d\r\n\tbar=%d\r\n[/foo]\r\n", this->bar, this->_foo);
}
void more_foo_print(void *t)
{
more_foo *this = t;
printf("[more foo]\r\n");
foo_print(&this->base);
printf("\tmore=%d\r\n", this->more);
printf("[/more foo]\r\n");
}
void foo_construct( foo *this, int foo, int bar )
{
this->_foo = foo;
this->bar = bar;
this->printme = foo_print;
}
void more_foo_construct(more_foo *t, int _foo, int bar, int more)
{
foo_construct((foo*)t, _foo, bar);
t->more = more;
// Overrides printme
t->base.printme = more_foo_print;
}
more_foo *new_more_foo(int _foo, int bar, int more)
{
more_foo * new_mf = NEW(more_foo);
more_foo_construct(new_mf, _foo, bar, more);
return new_mf;
}
foo *new_foo(int _foo, int bar)
{
foo *new_f = NEW(foo);
foo_construct(new_f, _foo, bar);
return new_f;
}
int main(void)
{
foo * mf = (foo*)new_more_foo(1, 2, 3);
foo * f = new_foo(7,8);
mf->printme(mf);
f->printme(f);
return 0;
}
printme() is overridden when creating "more foo". (polymorphism)
more_foo includes foo as a base structure (inheritance) so when "foo" structure changes, "more foo" changes with it (example new values added).
more_foo can be cast as "foo".
If I have these structures:
typedef struct { int x; } foo;
typedef struct { foo f; } bar;
Normally you would access x through b.f.x, but is there a way to set this up so that you can access element x without referring to f?
bar b;
b.x = ...
My first intuition is that you can't since there would be a possibility for name conflicts if two sub structures both had a member x and I can't figure out what the compile error would be. However, I recall working in some frameworks where this was possible.
In C++ I worked in a framework once where bar existed, and you could access its members as member variables this->x from a different class. I'm trying to figure out how that could be done.
You can with C11:
§ 6.7.2.1 -- 11
An unnamed member whose type specifier is a structure specifier with no tag is called an
anonymous structure; an unnamed member whose type specifier is a union specifier with
no tag is called an anonymous union. The members of an anonymous structure or union
are considered to be members of the containing structure or union. This applies
recursively if the containing structure or union is also anonymous.
So this code might work:
#include <stdio.h>
typedef struct { int x; } foo;
typedef struct { foo; } bar;
int main(void)
{
bar b;
b.x = 1;
printf("%d\n", b.x);
}
The problem here is that different compilers disagree in my tests on whether a typedef is acceptable as a struct specifier with no tag The standard specifies:
§ 6.7.8 -- 3
In a declaration whose storage-class specifier is typedef, each declarator defines an
identifier to be a typedef name that denotes the type specified for the identifier in the way
described in 6.7.6. [...] A typedef declaration does not introduce a new type, only a
synonym for the type so specified.
(emphasis mine) -- But does synonym also mean a typdef-name specifier is exchangeable for a struct specifier? gcc accepts this, clang doesn't.
Of course, there's no way to express the whole member of type foo with these declarations, you sacrifice your named member f.
Concerning your doubt about name collisions, this is what gcc has to say when you put another int x inside bar:
structinherit.c:4:27: error: duplicate member 'x'
typedef struct { foo; int x; } bar;
^
To avoid ambiguity, you can just repeat the struct, possibly #defined as a macro, but of course, this looks a bit ugly:
#include <stdio.h>
typedef struct { int x; } foo;
typedef struct { struct { int x; }; } bar;
int main(void)
{
bar b;
b.x = 1;
printf("%d\n", b.x);
}
But any conforming compiler should accept this code, so stick to this version.
<opinion>This is a pity, I like the syntax accepted by gcc much better, but as the wording of the standard doesn't make it explicit to allow this, the only safe bet is to assume it's forbidden, so clang is not to blame here...</opinion>
If you want to refer to x by either b.x or b.f.x, you can use an additional anonymous union like this:
#include <stdio.h>
typedef struct { int x; } foo;
typedef struct {
union { struct { int x; }; foo f; };
} bar;
int main(void)
{
bar b;
b.f.x = 2;
b.x = 1;
printf("%d\n", b.f.x); // <-- guaranteed to print 1
}
This will not cause aliasing issues because of
§ 6.5.2.3 -- 6
One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members
C: Highly unrecommended, but doable:
#include <stdio.h>
#define BAR_STRUCT struct { int x; }
typedef BAR_STRUCT bar;
typedef struct {
union {
bar b;
BAR_STRUCT;
};
} foo;
int main() {
foo f;
f.x = 989898;
printf("%d %d", f.b.x, f.x);
return 0;
}
Anonymous structs are a widly-spread extension in standards before C11.
C++:
The same as in C, you can do here but anonymous structs are not part of any C++ standard, but an extension.
Better use inheritance, or do not use this shortcut at all.
Of course, do not use something like #define x b.x)).
In C you can't access members of members like this.
You can however access members of an anonymous inner struct:
struct bar {
struct {
int x;
}
};
...
struct bar b;
b.x = 1;
In C++ you use inheritance:
struct foo {
int x;
};
struct bar: public foo {
};
...
struct bar b;
b.x = 1;
In C (99 and onward) you can access the common initial sub-sequence of union members, even if they weren't the last member written to1.
In C11, you can have anonymous union members. So:
typedef struct { int x; } foo;
typedef struct {
union {
foo f;
int x;
};
} bar;
Yes, that applies to structures. But according to the standard:
A structure pointer, suitably converted, points to the first member.
A union pointer, suitably converted, points to any union member.
So their location in memory is the same.
This is not possible in C. In C++ however you can use inheritance which is probably what you were thinking about.
In C++, you can use inheritance and member name conflicts are sort of resolvable with :: and treating the base classes as members.
struct foo { int x; };
struct bar : foo { };
struct foo1 { int x; };
struct bar1 : foo1 { char const* x; };
bar b;
bar1 b1;
int main()
{
return b.x + b1.foo1::x;
}
In standard C, it's impossible, however several compilers (gcc, clang, tinycc) support a similar thing as an extension (usually accessible with -fms-extensions (on gcc also with -fplan9-extensions which is a superset of -fms-extensions)), which allows you to do:
struct foo { int x; };
struct bar { struct foo; };
struct bar b = { 42 };
int main()
{
return b.x;
}
However, there's no resolution for conflicting member names with it, AFAIK.
In C++, it is possible in two ways. The first is to use inheritence. The second is for bar to contain a reference member named x (int &x), and constructors that initialise x to refer to f.x.
In C, it is not possible.
Since the C standard guarantees that there isn't padding before the first member of a struct, there isn't padding before the foo in bar, and there isn't padding before the x in foo. So, a raw memory access to the start of bar will access bar::foo::x.
You could do something like this:
#include <stdio.h>
#include <stdlib.h>
typedef struct _foo
{
int x;
} foo;
typedef struct _bar
{
foo f;
} bar;
int main()
{
bar b;
int val = 10;
// Setting the value:
memcpy(&b, &val, sizeof(int));
printf("%d\n", b.f.x);
b.f.x = 100;
// Reading the value:
memcpy(&val, &b, sizeof(int));
printf("%d\n", val);
return 0;
}
As others have noted, C++ offers a more elegant way of doing this through inheritance.
Essentially, if I have
typedef struct {
int x;
int y;
} A;
typedef struct {
int h;
int k;
} B;
and I have A a, does the C standard guarantee that ((B*)&a)->k is the same as a.y?
Are C-structs with the same members types guaranteed to have the same layout in memory?
Almost yes. Close enough for me.
From n1516, Section 6.5.2.3, paragraph 6:
... if a union contains several structures that share a common initial sequence ..., and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.
This means that if you have the following code:
struct a {
int x;
int y;
};
struct b {
int h;
int k;
};
union {
struct a a;
struct b b;
} u;
If you assign to u.a, the standard says that you can read the corresponding values from u.b. It stretches the bounds of plausibility to suggest that struct a and struct b can have different layout, given this requirement. Such a system would be pathological in the extreme.
Remember that the standard also guarantees that:
Structures are never trap representations.
Addresses of fields in a structure increase (a.x is always before a.y).
The offset of the first field is always zero.
However, and this is important!
You rephrased the question,
does the C standard guarantee that ((B*)&a)->k is the same as a.y?
No! And it very explicitly states that they are not the same!
struct a { int x; };
struct b { int x; };
int test(int value)
{
struct a a;
a.x = value;
return ((struct b *) &a)->x;
}
This is an aliasing violation.
Piggybacking on the other replies with a warning about section 6.5.2.3. Apparently there is some debate about the exact wording of anywhere that a declaration of the completed type of the union is visible, and at least GCC doesn't implement it as written. There are a few tangential C WG defect reports here and here with follow-up comments from the committee.
Recently I tried to find out how other compilers (specifically GCC 4.8.2, ICC 14, and clang 3.4) interpret this using the following code from the standard:
// Undefined, result could (realistically) be either -1 or 1
struct t1 { int m; } s1;
struct t2 { int m; } s2;
int f(struct t1 *p1, struct t2 *p2) {
if (p1->m < 0)
p2->m = -p2->m;
return p1->m;
}
int g() {
union {
struct t1 s1;
struct t2 s2;
} u;
u.s1.m = -1;
return f(&u.s1,&u.s2);
}
GCC: -1, clang: -1, ICC: 1 and warns about the aliasing violation
// Global union declaration, result should be 1 according to a literal reading of 6.5.2.3/6
struct t1 { int m; } s1;
struct t2 { int m; } s2;
union u {
struct t1 s1;
struct t2 s2;
};
int f(struct t1 *p1, struct t2 *p2) {
if (p1->m < 0)
p2->m = -p2->m;
return p1->m;
}
int g() {
union u u;
u.s1.m = -1;
return f(&u.s1,&u.s2);
}
GCC: -1, clang: -1, ICC: 1 but warns about aliasing violation
// Global union definition, result should be 1 as well.
struct t1 { int m; } s1;
struct t2 { int m; } s2;
union u {
struct t1 s1;
struct t2 s2;
} u;
int f(struct t1 *p1, struct t2 *p2) {
if (p1->m < 0)
p2->m = -p2->m;
return p1->m;
}
int g() {
u.s1.m = -1;
return f(&u.s1,&u.s2);
}
GCC: -1, clang: -1, ICC: 1, no warning
Of course, without strict aliasing optimizations all three compilers return the expected result every time. Since clang and gcc don't have distinguished results in any of the cases, the only real information comes from ICC's lack of a diagnostic on the last one. This also aligns with the example given by the standards committee in the first defect report mentioned above.
In other words, this aspect of C is a real minefield, and you'll have to be wary that your compiler is doing the right thing even if you follow the standard to the letter. All the worse since it's intuitive that such a pair of structs ought to be compatible in memory.
This sort of aliasing specifically requires a union type. C11 §6.5.2.3/6:
One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.
This example follows:
The following is not a valid fragment (because the union type is not
visible within function f):
struct t1 { int m; };
struct t2 { int m; };
int f(struct t1 *p1, struct t2 *p2)
{
if (p1->m < 0)
p2->m = -p2->m;
return p1->m;
}
int g() {
union {
struct t1 s1;
struct t2 s2;
} u;
/* ... */
return f(&u.s1, &u.s2);}
}
The requirements appear to be that 1. the object being aliased is stored inside a union and 2. that the definition of that union type is in scope.
For what it's worth, the corresponding initial-subsequence relationship in C++ does not require a union. And in general, such union dependence would be an extremely pathological behavior for a compiler. If there's some way the existence of a union type could affect a concerete memory model, it's probably better not to try to picture it.
I suppose the intent is that a memory access verifier (think Valgrind on steroids) can check a potential aliasing error against these "strict" rules.
I want to expand on #Dietrich Epp 's answer. Here is a quote from C99:
6.7.2.1 point 14
... A pointer to a union object, suitably converted, points to each of its members ... and vice versa.
Which means we can copy the memory from a struct to a union containing it:
struct a
{
int foo;
char bar;
};
struct b
{
int foo;
char bar;
};
union ab
{
struct a a;
struct b b;
};
void test(struct a *aa)
{
union ab ab;
memcpy(&ab, aa, sizeof *aa);
// ...
}
C99 also says:
6.5.2.3 point 5
One special guarantee is made in order to simplify the use of unions: if a union contains
several structures that share a common initial sequence ..., and if the union
object currently contains one of these structures, it is permitted to inspect the common
initial part of any of them anywhere that a declaration of the complete type of the union is
visible. Two structures share a common initial sequence if corresponding members have
compatible types .... for a sequence of one or more initial members.
Which means the following will also be legal after the memcpy:
ab.a.bar;
ab.b.bar;
The struct could be initialized in a separate translation unit and the copying is done in the standard library (out of the control of the compiler).
Thus, memcpy will copy byte-by-byte the value of the object of type struct a and the compiler has to ensure the result is valid for both structs.
The compiler cannot do anything other than generate instructions that read from the corresponding memory offset for both of those lines, thus the address needs to be the same.
Even though it is not stated explicitly, I would say the standard implies that C-structs with the same member types have the same layout in memory.