Type Punning with Unions and Heap - c

I have read a lot about type punning and how it is not good to just use a cast.
oldType* data = malloc(sizeof(oldType));
((newtype*)data)->newElement;
This results in undefined behavior. So the solution is to use union so that the compiler knows that these two pointers are linked to one another so it doesn't do funny things with strict aliasing.
That being said the unions also looked like:
union testing
{
struct test1 e;
struct test2 f;
}
Is it defined behavior if pointers are used in the union?
union testing
{
struct test1* e;
struct test2* f;
}
Here is a full example:
#include <stdio.h>
#include <stdlib.h>
struct test1
{
int a;
char b;
};
struct test2
{
int c;
char d;
};
union testing
{
struct test1* e;
struct test2* f;
};
void printer(const struct test2* value);
int main()
{
struct test1* aQuickTest = malloc(sizeof(struct test1));
aQuickTest->a = 42;
aQuickTest->b = 'a';
printer(((union testing)aQuickTest).f);
((union testing)aQuickTest.f)->c = 111; // using -> not .
return 0;
}
void printer(const struct test2* value)
{
printf("Int: %i Char: %c",value->c, value->d);
}
Or would I need to use unions without pointers. And then use printer(&(((union testing)aQuickTest).f)); (with the &) to get the address of f.

It is non-conforming to cast to a union type, as your code does:
printer(((union testing)aQuickTest).f);
For that reason, your code does have undefined behavior as far as the Standard is concerned.
More directly to the point, however, no, your approach of putting pointers into a union does not avoid strict aliasing violations with respect to the pointed-to types, even without the casting issue. In your case, the effect is that where your union testing is in scope, implementations cannot assume that objects of type struct test1 ** and struct test2 ** do not alias each other. That does not prevent undefined defined behavior resulting from accessing an object with effective type struct test1 through an lvalue of type struct test2.

Suppose you want to type pun types X and Y, you should use the union -
typedef union {
X x;
Y y;
}X_Y;
This allows you to share the bit representation of X with Y and vice versa.
If you use -
typedef union {
X* x;
Y* y;
}X_Y_p;
you are sharing the bit representations for the pointer. For a system that uses the same bit representation for all pointer, you are essentially casting pointer of X to pointer of Y, which you identified causes Undefined Behaviour.
It is not illegal to have something X_Y_p because X* and Y* are types by themselves. But they achieve something different. They let you type pun pointers, which is not what you want to do (and not necessary in most cases, because pointers share representation on most systems). A cast should be fine there.

Related

Does dereferencing a cast to an anonymous structure pointer violate strict aliasing?

I have heard conflicting things about the extent to which the C standards guarantee structure layout consistency. Arguments for a limited extent have mentioned strict aliasing rules. For example, compare these two answers: https://stackoverflow.com/a/3766251/1306666 and https://stackoverflow.com/a/3766967/1306666.
In the following code I assume in all structures foo, bar, and struct { char *id; } that char *id is in the same place, making it safe to cast between them if it is the only member accessed.
Regardless of whether the cast will ever result in an error, does it violate strict aliasing rules?
#include <string.h>
struct foo {
char *id;
int a;
};
struct bar {
char *id;
int x, y, z;
};
struct list {
struct list *next;
union {
struct foo *foop;
struct bar *barp;
void *either;
} ptr;
};
struct list *find_id(struct list *l, char *key)
{
while (l != NULL) {
/* cast to anonymous struct and dereferenced */
if (!strcmp(((struct { char *id; } *)(l->ptr.either))->id, key))
return l;
l = l->next;
}
return NULL;
}
gcc -o /dev/null -Wstrict-aliasing test.c
Note gcc gives no errors.
Yes, there are multiple aliasing-related issues in your program. The use of the lvalue with anonymous structure type, which does not match the type of the underlying object, results in undefined behavior. It could be fixed with something like:
*(char**)((char *)either + offsetof(struct { ... char *id; ... }, id))
if you know the id member is at the same offset in all of them (e.g. they all share same prefix). But in your specific case where it's the first member you can just do:
*(char**)either
because it's always valid to convert a pointer to a struct to a pointer to its first member (and back).
A separate issue is that your use of the union is wrong. The biggest issue is that it assumes struct foo *, struct bar *, and void * all have the same size and representation, which is not guaranteed. Also, it's arguably undefined to access a member of the union other than the one which was previously stored, but as a result of interpretations in defect reports, it's probably safe to say it's equivalent to a "reinterpret cast". But that gets you back to the issue of wrongly assuming same size/representation.
You should just remove the union, use a void * member, and convert the value (rather than reinterpret the bits) to the right pointer type to access the pointed-to structure (struct foo * or struct bar *) or its initial id field (char *).

C: Avoid indirection to refer to fields of struct inside struct [duplicate]

If I have these structures:
typedef struct { int x; } foo;
typedef struct { foo f; } bar;
Normally you would access x through b.f.x, but is there a way to set this up so that you can access element x without referring to f?
bar b;
b.x = ...
My first intuition is that you can't since there would be a possibility for name conflicts if two sub structures both had a member x and I can't figure out what the compile error would be. However, I recall working in some frameworks where this was possible.
In C++ I worked in a framework once where bar existed, and you could access its members as member variables this->x from a different class. I'm trying to figure out how that could be done.
You can with C11:
§ 6.7.2.1 -- 11
An unnamed member whose type specifier is a structure specifier with no tag is called an
anonymous structure; an unnamed member whose type specifier is a union specifier with
no tag is called an anonymous union. The members of an anonymous structure or union
are considered to be members of the containing structure or union. This applies
recursively if the containing structure or union is also anonymous.
So this code might work:
#include <stdio.h>
typedef struct { int x; } foo;
typedef struct { foo; } bar;
int main(void)
{
bar b;
b.x = 1;
printf("%d\n", b.x);
}
The problem here is that different compilers disagree in my tests on whether a typedef is acceptable as a struct specifier with no tag The standard specifies:
§ 6.7.8 -- 3
In a declaration whose storage-class specifier is typedef, each declarator defines an
identifier to be a typedef name that denotes the type specified for the identifier in the way
described in 6.7.6. [...] A typedef declaration does not introduce a new type, only a
synonym for the type so specified.
(emphasis mine) -- But does synonym also mean a typdef-name specifier is exchangeable for a struct specifier? gcc accepts this, clang doesn't.
Of course, there's no way to express the whole member of type foo with these declarations, you sacrifice your named member f.
Concerning your doubt about name collisions, this is what gcc has to say when you put another int x inside bar:
structinherit.c:4:27: error: duplicate member 'x'
typedef struct { foo; int x; } bar;
^
To avoid ambiguity, you can just repeat the struct, possibly #defined as a macro, but of course, this looks a bit ugly:
#include <stdio.h>
typedef struct { int x; } foo;
typedef struct { struct { int x; }; } bar;
int main(void)
{
bar b;
b.x = 1;
printf("%d\n", b.x);
}
But any conforming compiler should accept this code, so stick to this version.
<opinion>This is a pity, I like the syntax accepted by gcc much better, but as the wording of the standard doesn't make it explicit to allow this, the only safe bet is to assume it's forbidden, so clang is not to blame here...</opinion>
If you want to refer to x by either b.x or b.f.x, you can use an additional anonymous union like this:
#include <stdio.h>
typedef struct { int x; } foo;
typedef struct {
union { struct { int x; }; foo f; };
} bar;
int main(void)
{
bar b;
b.f.x = 2;
b.x = 1;
printf("%d\n", b.f.x); // <-- guaranteed to print 1
}
This will not cause aliasing issues because of
§ 6.5.2.3 -- 6
One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members
C: Highly unrecommended, but doable:
#include <stdio.h>
#define BAR_STRUCT struct { int x; }
typedef BAR_STRUCT bar;
typedef struct {
union {
bar b;
BAR_STRUCT;
};
} foo;
int main() {
foo f;
f.x = 989898;
printf("%d %d", f.b.x, f.x);
return 0;
}
Anonymous structs are a widly-spread extension in standards before C11.
C++:
The same as in C, you can do here but anonymous structs are not part of any C++ standard, but an extension.
Better use inheritance, or do not use this shortcut at all.
Of course, do not use something like #define x b.x)).
In C you can't access members of members like this.
You can however access members of an anonymous inner struct:
struct bar {
struct {
int x;
}
};
...
struct bar b;
b.x = 1;
In C++ you use inheritance:
struct foo {
int x;
};
struct bar: public foo {
};
...
struct bar b;
b.x = 1;
In C (99 and onward) you can access the common initial sub-sequence of union members, even if they weren't the last member written to1.
In C11, you can have anonymous union members. So:
typedef struct { int x; } foo;
typedef struct {
union {
foo f;
int x;
};
} bar;
Yes, that applies to structures. But according to the standard:
A structure pointer, suitably converted, points to the first member.
A union pointer, suitably converted, points to any union member.
So their location in memory is the same.
This is not possible in C. In C++ however you can use inheritance which is probably what you were thinking about.
In C++, you can use inheritance and member name conflicts are sort of resolvable with :: and treating the base classes as members.
struct foo { int x; };
struct bar : foo { };
struct foo1 { int x; };
struct bar1 : foo1 { char const* x; };
bar b;
bar1 b1;
int main()
{
return b.x + b1.foo1::x;
}
In standard C, it's impossible, however several compilers (gcc, clang, tinycc) support a similar thing as an extension (usually accessible with -fms-extensions (on gcc also with -fplan9-extensions which is a superset of -fms-extensions)), which allows you to do:
struct foo { int x; };
struct bar { struct foo; };
struct bar b = { 42 };
int main()
{
return b.x;
}
However, there's no resolution for conflicting member names with it, AFAIK.
In C++, it is possible in two ways. The first is to use inheritence. The second is for bar to contain a reference member named x (int &x), and constructors that initialise x to refer to f.x.
In C, it is not possible.
Since the C standard guarantees that there isn't padding before the first member of a struct, there isn't padding before the foo in bar, and there isn't padding before the x in foo. So, a raw memory access to the start of bar will access bar::foo::x.
You could do something like this:
#include <stdio.h>
#include <stdlib.h>
typedef struct _foo
{
int x;
} foo;
typedef struct _bar
{
foo f;
} bar;
int main()
{
bar b;
int val = 10;
// Setting the value:
memcpy(&b, &val, sizeof(int));
printf("%d\n", b.f.x);
b.f.x = 100;
// Reading the value:
memcpy(&val, &b, sizeof(int));
printf("%d\n", val);
return 0;
}
As others have noted, C++ offers a more elegant way of doing this through inheritance.

C: casting to structure with different size

Quick simple question;
Does this
typedef struct {int a; int b;} S1;
typedef struct {int a;} S2;
((S2*)(POINTER_TO_AN_S1))->a=1;
Always return (and assign) the member a of the structure? Or is it undefined behavior?
In a conforming compiler, if both structure types appear within the complete definition of a union type which is visible where the structure is accessed, and if the target of the pointer happened to be an instance of that union type, behavior would be defined. Note that the Standard does not require that the compiler have any way of knowing that the target of the pointer is actually an object of that union type--merely that the declaration of the complete union type be visible.
Note, however, that gcc does not abide by the Standard here, unless the -fno-strict-aliasing flag is used. Even in cases where the complete union type is visible, and a compiler can see that it is in fact working with objects of the union type, gcc ignores the aliasing. For example, given:
struct s1 {int x;};
struct s2 {int x;};
union u { struct s1 s1; struct s2 s2;};
int read_s1_x(struct s1 *p) { return p->x; }
int read_s2_x(struct s2 *p) { return p->x; }
int write_s1_x(struct s1 *p, int value) { p->x = value; }
int write_s2_x(struct s2 *p, int value) { p->x = value; }
int test(union u *u1, union u *u2)
{
write_s2_x(&u2->s2, 0);
if (!read_s1_x(&u1->s1))
write_s2_x(&u2->s2, 1);
return read_s1_x(&u1->s1);
}
a compiler will decide that it no doesn't need to re-read the value of
u1->s1.x after it writes u2->s2.x, even though the complete union type
is visible and even though a compiler can see that both u1 and u2 are
pointers to objects of the union type. I'm not quite sure what the
authors of gcc think the address-of operator is supposed to mean when
applied to a union type if the resulting pointer can't even be used to
immediately access an object of that member type.

Are C-structs with the same members types guaranteed to have the same layout in memory?

Essentially, if I have
typedef struct {
int x;
int y;
} A;
typedef struct {
int h;
int k;
} B;
and I have A a, does the C standard guarantee that ((B*)&a)->k is the same as a.y?
Are C-structs with the same members types guaranteed to have the same layout in memory?
Almost yes. Close enough for me.
From n1516, Section 6.5.2.3, paragraph 6:
... if a union contains several structures that share a common initial sequence ..., and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.
This means that if you have the following code:
struct a {
int x;
int y;
};
struct b {
int h;
int k;
};
union {
struct a a;
struct b b;
} u;
If you assign to u.a, the standard says that you can read the corresponding values from u.b. It stretches the bounds of plausibility to suggest that struct a and struct b can have different layout, given this requirement. Such a system would be pathological in the extreme.
Remember that the standard also guarantees that:
Structures are never trap representations.
Addresses of fields in a structure increase (a.x is always before a.y).
The offset of the first field is always zero.
However, and this is important!
You rephrased the question,
does the C standard guarantee that ((B*)&a)->k is the same as a.y?
No! And it very explicitly states that they are not the same!
struct a { int x; };
struct b { int x; };
int test(int value)
{
struct a a;
a.x = value;
return ((struct b *) &a)->x;
}
This is an aliasing violation.
Piggybacking on the other replies with a warning about section 6.5.2.3. Apparently there is some debate about the exact wording of anywhere that a declaration of the completed type of the union is visible, and at least GCC doesn't implement it as written. There are a few tangential C WG defect reports here and here with follow-up comments from the committee.
Recently I tried to find out how other compilers (specifically GCC 4.8.2, ICC 14, and clang 3.4) interpret this using the following code from the standard:
// Undefined, result could (realistically) be either -1 or 1
struct t1 { int m; } s1;
struct t2 { int m; } s2;
int f(struct t1 *p1, struct t2 *p2) {
if (p1->m < 0)
p2->m = -p2->m;
return p1->m;
}
int g() {
union {
struct t1 s1;
struct t2 s2;
} u;
u.s1.m = -1;
return f(&u.s1,&u.s2);
}
GCC: -1, clang: -1, ICC: 1 and warns about the aliasing violation
// Global union declaration, result should be 1 according to a literal reading of 6.5.2.3/6
struct t1 { int m; } s1;
struct t2 { int m; } s2;
union u {
struct t1 s1;
struct t2 s2;
};
int f(struct t1 *p1, struct t2 *p2) {
if (p1->m < 0)
p2->m = -p2->m;
return p1->m;
}
int g() {
union u u;
u.s1.m = -1;
return f(&u.s1,&u.s2);
}
GCC: -1, clang: -1, ICC: 1 but warns about aliasing violation
// Global union definition, result should be 1 as well.
struct t1 { int m; } s1;
struct t2 { int m; } s2;
union u {
struct t1 s1;
struct t2 s2;
} u;
int f(struct t1 *p1, struct t2 *p2) {
if (p1->m < 0)
p2->m = -p2->m;
return p1->m;
}
int g() {
u.s1.m = -1;
return f(&u.s1,&u.s2);
}
GCC: -1, clang: -1, ICC: 1, no warning
Of course, without strict aliasing optimizations all three compilers return the expected result every time. Since clang and gcc don't have distinguished results in any of the cases, the only real information comes from ICC's lack of a diagnostic on the last one. This also aligns with the example given by the standards committee in the first defect report mentioned above.
In other words, this aspect of C is a real minefield, and you'll have to be wary that your compiler is doing the right thing even if you follow the standard to the letter. All the worse since it's intuitive that such a pair of structs ought to be compatible in memory.
This sort of aliasing specifically requires a union type. C11 §6.5.2.3/6:
One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.
This example follows:
The following is not a valid fragment (because the union type is not
visible within function f):
struct t1 { int m; };
struct t2 { int m; };
int f(struct t1 *p1, struct t2 *p2)
{
if (p1->m < 0)
p2->m = -p2->m;
return p1->m;
}
int g() {
union {
struct t1 s1;
struct t2 s2;
} u;
/* ... */
return f(&u.s1, &u.s2);}
}
The requirements appear to be that 1. the object being aliased is stored inside a union and 2. that the definition of that union type is in scope.
For what it's worth, the corresponding initial-subsequence relationship in C++ does not require a union. And in general, such union dependence would be an extremely pathological behavior for a compiler. If there's some way the existence of a union type could affect a concerete memory model, it's probably better not to try to picture it.
I suppose the intent is that a memory access verifier (think Valgrind on steroids) can check a potential aliasing error against these "strict" rules.
I want to expand on #Dietrich Epp 's answer. Here is a quote from C99:
6.7.2.1 point 14
... A pointer to a union object, suitably converted, points to each of its members ... and vice versa.
Which means we can copy the memory from a struct to a union containing it:
struct a
{
int foo;
char bar;
};
struct b
{
int foo;
char bar;
};
union ab
{
struct a a;
struct b b;
};
void test(struct a *aa)
{
union ab ab;
memcpy(&ab, aa, sizeof *aa);
// ...
}
C99 also says:
6.5.2.3 point 5
One special guarantee is made in order to simplify the use of unions: if a union contains
several structures that share a common initial sequence ..., and if the union
object currently contains one of these structures, it is permitted to inspect the common
initial part of any of them anywhere that a declaration of the complete type of the union is
visible. Two structures share a common initial sequence if corresponding members have
compatible types .... for a sequence of one or more initial members.
Which means the following will also be legal after the memcpy:
ab.a.bar;
ab.b.bar;
The struct could be initialized in a separate translation unit and the copying is done in the standard library (out of the control of the compiler).
Thus, memcpy will copy byte-by-byte the value of the object of type struct a and the compiler has to ensure the result is valid for both structs.
The compiler cannot do anything other than generate instructions that read from the corresponding memory offset for both of those lines, thus the address needs to be the same.
Even though it is not stated explicitly, I would say the standard implies that C-structs with the same member types have the same layout in memory.

C inheritance through type punning, without containment?

I'm in a position where I need to get some object oriented features working in C, in particular inheritance. Luckily there are some good references on stack overflow, notably this Semi-inheritance in C: How does this snippet work? and this Object-orientation in C. The the idea is to contain an instance of the base class within the derived class and typecast it, like so:
struct base {
int x;
int y;
};
struct derived {
struct base super;
int z;
};
struct derived d;
d.super.x = 1;
d.super.y = 2;
d.z = 3;
struct base b = (struct base *)&d;
This is great, but it becomes cumbersome with deep inheritance trees - I'll have chains of about 5-6 "classes" and I'd really rather not type derived.super.super.super.super.super.super all the time. What I was hoping was that I could typecast to a struct of the first n elements, like this:
struct base {
int x;
int y;
};
struct derived {
int x;
int y;
int z;
};
struct derived d;
d.x = 1;
d.y = 2;
d.z = 3;
struct base b = (struct base *)&d;
I've tested this on the C compiler that comes with Visual Studio 2012 and it works, but I have no idea if the C standard actually guarantees it. Is there anyone that might know for sure if this is ok? I don't want to write mountains of code only to discover it's broken at such a fundamental level.
What you describe here is a construct that was fully portable and would have been essentially guaranteed to work by the design of the language, except that the authors of the Standard didn't think it was necessary to explicitly mandate that compilers support things that should obviously work. C89 specified the Common Initial Sequence rule for unions, rather than pointers to structures, because given:
struct s1 {int x; int y; ... other stuff... };
struct s2 {int x; int y; ... other stuff... };
union u { struct s1 v1; struct s2 v2; };
code which received a struct s1* to an outside object that was either
a union u* or a malloc'ed object could legally cast it to a union u*
if it was aligned for that type, and it could legally cast the resulting
pointer to struct s2*, and the effect of using accessing either struct s1* or struct s2* would have to be the same as accessing the union via either the v1 or v2 member. Consequently, the only way for a compiler to make all of the indicated rules work would be to say that converting a pointer of one structure type into a pointer of another type and using that pointer to inspect members of the Common Initial Sequence would work.
Unfortunately, compiler writers have said that the CIS rule is only applicable in cases where the underlying object has a union type, notwithstanding the fact that such a thing represents a very rare usage case (compared with situations where the union type exists for the purpose of letting the compiler know that pointers to the structures should be treated interchangeably for purposes of inspecting the CIS), and further since it would be rare for code to receive a struct s1* or struct s2* that identifies an object within a union u, they think they should be allowed to ignore that possibility. Thus, even if the above declarations are visible, gcc will assume that a struct s1* will never be used to access members of the CIS from a struct s2*.
By using pointers you can always create references to base classes at any level in the hierarchy. And if you use some kind of description of the inheritance structure, you can generate both the "class definitions" and factory functions needed as a build step.
#include <stdio.h>
#include <stdlib.h>
struct foo_class {
int a;
int b;
};
struct bar_class {
struct foo_class foo;
struct foo_class* base;
int c;
int d;
};
struct gazonk_class {
struct bar_class bar;
struct bar_class* base;
struct foo_class* Foo;
int e;
int f;
};
struct gazonk_class* gazonk_factory() {
struct gazonk_class* new_instance = malloc(sizeof(struct gazonk_class));
new_instance->bar.base = &new_instance->bar.foo;
new_instance->base = &new_instance->bar;
new_instance->Foo = &new_instance->bar.foo;
return new_instance;
}
int main(int argc, char* argv[]) {
struct gazonk_class* object = gazonk_factory();
object->Foo->a = 1;
object->Foo->b = 2;
object->base->c = 3;
object->base->d = 4;
object->e = 5;
object->f = 6;
fprintf(stdout, "%d %d %d %d %d %d\n",
object->base->base->a,
object->base->base->b,
object->base->c,
object->base->d,
object->e,
object->f);
return 0;
}
In this example you can either use base pointers to work your way back or directly reference a base class.
The address of a struct is the address of its first element, guaranteed.

Resources