I have a struct X which inherits from struct Base. However, in my current setup, due to alignment, size of X is 24B:
typedef struct {
double_t a;
int8_t b;
} Base;
typedef struct {
Base base;
int8_t c;
} X;
In order to save the memory, I'd like to unwind the Base struct, so I created struct Y which contains fields from Base (in the same order, always at the beginning of the struct), so the size of the struct is 16B:
typedef struct {
double_t base_a;
int8_t base_b;
int8_t c;
} Y;
Then I'm going to use instance of struct Y in a method which expects a pointer to Base struct:
void print_base(Base* b)
{
printf("%f %d\n", b->a, b->b);
}
// ...
Y data;
print_base((Base*)&data);
Does the code above violates the strict aliasing rule, and causes undefined behavior?
First, Base and Y are not compatible types as defined by the standard 6.2.7, all members must match.
To access an Y through a Base* without creating a strict aliasing violation, Y needs to be "an aggregate type" (it is) that contains a Base type among its members. It does not.
So it is a strict aliasing violation and furthermore, since Y and Base are not compatible, they may have different memory layouts. Which is kind of the whole point, you made them different types for that very reason :)
What you can do in situations like this, is to use unions with struct members that share a common initial sequence, which is a special allowed case. Example of valid code from C11 6.5.2.3:
union {
struct {
int alltypes;
} n;
struct {
int type;
int intnode;
} ni;
struct {
int type;
double doublenode;
} nf;
} u;
u.nf.type = 1;
u.nf.doublenode = 3.14;
/* ... */
if (u.n.alltypes == 1)
if (sin(u.nf.doublenode) == 0.0)
Related
I have read a lot about type punning and how it is not good to just use a cast.
oldType* data = malloc(sizeof(oldType));
((newtype*)data)->newElement;
This results in undefined behavior. So the solution is to use union so that the compiler knows that these two pointers are linked to one another so it doesn't do funny things with strict aliasing.
That being said the unions also looked like:
union testing
{
struct test1 e;
struct test2 f;
}
Is it defined behavior if pointers are used in the union?
union testing
{
struct test1* e;
struct test2* f;
}
Here is a full example:
#include <stdio.h>
#include <stdlib.h>
struct test1
{
int a;
char b;
};
struct test2
{
int c;
char d;
};
union testing
{
struct test1* e;
struct test2* f;
};
void printer(const struct test2* value);
int main()
{
struct test1* aQuickTest = malloc(sizeof(struct test1));
aQuickTest->a = 42;
aQuickTest->b = 'a';
printer(((union testing)aQuickTest).f);
((union testing)aQuickTest.f)->c = 111; // using -> not .
return 0;
}
void printer(const struct test2* value)
{
printf("Int: %i Char: %c",value->c, value->d);
}
Or would I need to use unions without pointers. And then use printer(&(((union testing)aQuickTest).f)); (with the &) to get the address of f.
It is non-conforming to cast to a union type, as your code does:
printer(((union testing)aQuickTest).f);
For that reason, your code does have undefined behavior as far as the Standard is concerned.
More directly to the point, however, no, your approach of putting pointers into a union does not avoid strict aliasing violations with respect to the pointed-to types, even without the casting issue. In your case, the effect is that where your union testing is in scope, implementations cannot assume that objects of type struct test1 ** and struct test2 ** do not alias each other. That does not prevent undefined defined behavior resulting from accessing an object with effective type struct test1 through an lvalue of type struct test2.
Suppose you want to type pun types X and Y, you should use the union -
typedef union {
X x;
Y y;
}X_Y;
This allows you to share the bit representation of X with Y and vice versa.
If you use -
typedef union {
X* x;
Y* y;
}X_Y_p;
you are sharing the bit representations for the pointer. For a system that uses the same bit representation for all pointer, you are essentially casting pointer of X to pointer of Y, which you identified causes Undefined Behaviour.
It is not illegal to have something X_Y_p because X* and Y* are types by themselves. But they achieve something different. They let you type pun pointers, which is not what you want to do (and not necessary in most cases, because pointers share representation on most systems). A cast should be fine there.
If I have these structures:
typedef struct { int x; } foo;
typedef struct { foo f; } bar;
Normally you would access x through b.f.x, but is there a way to set this up so that you can access element x without referring to f?
bar b;
b.x = ...
My first intuition is that you can't since there would be a possibility for name conflicts if two sub structures both had a member x and I can't figure out what the compile error would be. However, I recall working in some frameworks where this was possible.
In C++ I worked in a framework once where bar existed, and you could access its members as member variables this->x from a different class. I'm trying to figure out how that could be done.
You can with C11:
§ 6.7.2.1 -- 11
An unnamed member whose type specifier is a structure specifier with no tag is called an
anonymous structure; an unnamed member whose type specifier is a union specifier with
no tag is called an anonymous union. The members of an anonymous structure or union
are considered to be members of the containing structure or union. This applies
recursively if the containing structure or union is also anonymous.
So this code might work:
#include <stdio.h>
typedef struct { int x; } foo;
typedef struct { foo; } bar;
int main(void)
{
bar b;
b.x = 1;
printf("%d\n", b.x);
}
The problem here is that different compilers disagree in my tests on whether a typedef is acceptable as a struct specifier with no tag The standard specifies:
§ 6.7.8 -- 3
In a declaration whose storage-class specifier is typedef, each declarator defines an
identifier to be a typedef name that denotes the type specified for the identifier in the way
described in 6.7.6. [...] A typedef declaration does not introduce a new type, only a
synonym for the type so specified.
(emphasis mine) -- But does synonym also mean a typdef-name specifier is exchangeable for a struct specifier? gcc accepts this, clang doesn't.
Of course, there's no way to express the whole member of type foo with these declarations, you sacrifice your named member f.
Concerning your doubt about name collisions, this is what gcc has to say when you put another int x inside bar:
structinherit.c:4:27: error: duplicate member 'x'
typedef struct { foo; int x; } bar;
^
To avoid ambiguity, you can just repeat the struct, possibly #defined as a macro, but of course, this looks a bit ugly:
#include <stdio.h>
typedef struct { int x; } foo;
typedef struct { struct { int x; }; } bar;
int main(void)
{
bar b;
b.x = 1;
printf("%d\n", b.x);
}
But any conforming compiler should accept this code, so stick to this version.
<opinion>This is a pity, I like the syntax accepted by gcc much better, but as the wording of the standard doesn't make it explicit to allow this, the only safe bet is to assume it's forbidden, so clang is not to blame here...</opinion>
If you want to refer to x by either b.x or b.f.x, you can use an additional anonymous union like this:
#include <stdio.h>
typedef struct { int x; } foo;
typedef struct {
union { struct { int x; }; foo f; };
} bar;
int main(void)
{
bar b;
b.f.x = 2;
b.x = 1;
printf("%d\n", b.f.x); // <-- guaranteed to print 1
}
This will not cause aliasing issues because of
§ 6.5.2.3 -- 6
One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members
C: Highly unrecommended, but doable:
#include <stdio.h>
#define BAR_STRUCT struct { int x; }
typedef BAR_STRUCT bar;
typedef struct {
union {
bar b;
BAR_STRUCT;
};
} foo;
int main() {
foo f;
f.x = 989898;
printf("%d %d", f.b.x, f.x);
return 0;
}
Anonymous structs are a widly-spread extension in standards before C11.
C++:
The same as in C, you can do here but anonymous structs are not part of any C++ standard, but an extension.
Better use inheritance, or do not use this shortcut at all.
Of course, do not use something like #define x b.x)).
In C you can't access members of members like this.
You can however access members of an anonymous inner struct:
struct bar {
struct {
int x;
}
};
...
struct bar b;
b.x = 1;
In C++ you use inheritance:
struct foo {
int x;
};
struct bar: public foo {
};
...
struct bar b;
b.x = 1;
In C (99 and onward) you can access the common initial sub-sequence of union members, even if they weren't the last member written to1.
In C11, you can have anonymous union members. So:
typedef struct { int x; } foo;
typedef struct {
union {
foo f;
int x;
};
} bar;
Yes, that applies to structures. But according to the standard:
A structure pointer, suitably converted, points to the first member.
A union pointer, suitably converted, points to any union member.
So their location in memory is the same.
This is not possible in C. In C++ however you can use inheritance which is probably what you were thinking about.
In C++, you can use inheritance and member name conflicts are sort of resolvable with :: and treating the base classes as members.
struct foo { int x; };
struct bar : foo { };
struct foo1 { int x; };
struct bar1 : foo1 { char const* x; };
bar b;
bar1 b1;
int main()
{
return b.x + b1.foo1::x;
}
In standard C, it's impossible, however several compilers (gcc, clang, tinycc) support a similar thing as an extension (usually accessible with -fms-extensions (on gcc also with -fplan9-extensions which is a superset of -fms-extensions)), which allows you to do:
struct foo { int x; };
struct bar { struct foo; };
struct bar b = { 42 };
int main()
{
return b.x;
}
However, there's no resolution for conflicting member names with it, AFAIK.
In C++, it is possible in two ways. The first is to use inheritence. The second is for bar to contain a reference member named x (int &x), and constructors that initialise x to refer to f.x.
In C, it is not possible.
Since the C standard guarantees that there isn't padding before the first member of a struct, there isn't padding before the foo in bar, and there isn't padding before the x in foo. So, a raw memory access to the start of bar will access bar::foo::x.
You could do something like this:
#include <stdio.h>
#include <stdlib.h>
typedef struct _foo
{
int x;
} foo;
typedef struct _bar
{
foo f;
} bar;
int main()
{
bar b;
int val = 10;
// Setting the value:
memcpy(&b, &val, sizeof(int));
printf("%d\n", b.f.x);
b.f.x = 100;
// Reading the value:
memcpy(&val, &b, sizeof(int));
printf("%d\n", val);
return 0;
}
As others have noted, C++ offers a more elegant way of doing this through inheritance.
The typical C99 way to extending stuct is something like
struct Base {
int x;
/* ... */
};
struct Derived {
struct Base base_part;
int y;
/* ... */
};
Then we may cast instance of struct Derived * to struct Base * and then access x.
I want to access base elements of struct Derived * obj; directly, for example obj->x and obj->y. C11 provide extended structs, but as explained here we can use this feature only with anonymous definitions. Then how about to write
#define BASE_BODY { \
int x; \
}
struct Base BASE_BODY;
struct Derived {
struct BASE_BODY;
int y;
};
Then I may access Base members same as it's part of Derived without any casts or intermediate members. I can cast Derived pointer to Base pointer if need do.
Is this acceptable? Are there any pitfalls?
There are pitfalls.
Consider:
#define BASE_BODY { \
double a; \
short b; \
}
struct Base BASE_BODY;
struct Derived {
struct BASE_BODY;
short c;
};
On some implementation it could be that sizeof(Base) == sizeof(Derived), but:
struct Base {
double a;
// Padding here
short b;
}
struct Derived {
double a;
short b;
short c;
};
There is no guarantee that at the beginning of the struct memory layout is the same. Therefore you cannot pass this kind of Derived * to function expecting Base *, and expect it to work.
And even if padding would not mess up the layout, there is a still potential problem with trap presenstation:
If again sizeof(Base) == sizeof(Derived), but c ends up to a area which is covered by the padding at the end of Base. Passing pointer of this struct to function which expects Base* and modifies it, might affect padding bits too (padding has unspecified value), thus possibly corrupting c and maybe even creating trap presentation.
I'm in a position where I need to get some object oriented features working in C, in particular inheritance. Luckily there are some good references on stack overflow, notably this Semi-inheritance in C: How does this snippet work? and this Object-orientation in C. The the idea is to contain an instance of the base class within the derived class and typecast it, like so:
struct base {
int x;
int y;
};
struct derived {
struct base super;
int z;
};
struct derived d;
d.super.x = 1;
d.super.y = 2;
d.z = 3;
struct base b = (struct base *)&d;
This is great, but it becomes cumbersome with deep inheritance trees - I'll have chains of about 5-6 "classes" and I'd really rather not type derived.super.super.super.super.super.super all the time. What I was hoping was that I could typecast to a struct of the first n elements, like this:
struct base {
int x;
int y;
};
struct derived {
int x;
int y;
int z;
};
struct derived d;
d.x = 1;
d.y = 2;
d.z = 3;
struct base b = (struct base *)&d;
I've tested this on the C compiler that comes with Visual Studio 2012 and it works, but I have no idea if the C standard actually guarantees it. Is there anyone that might know for sure if this is ok? I don't want to write mountains of code only to discover it's broken at such a fundamental level.
What you describe here is a construct that was fully portable and would have been essentially guaranteed to work by the design of the language, except that the authors of the Standard didn't think it was necessary to explicitly mandate that compilers support things that should obviously work. C89 specified the Common Initial Sequence rule for unions, rather than pointers to structures, because given:
struct s1 {int x; int y; ... other stuff... };
struct s2 {int x; int y; ... other stuff... };
union u { struct s1 v1; struct s2 v2; };
code which received a struct s1* to an outside object that was either
a union u* or a malloc'ed object could legally cast it to a union u*
if it was aligned for that type, and it could legally cast the resulting
pointer to struct s2*, and the effect of using accessing either struct s1* or struct s2* would have to be the same as accessing the union via either the v1 or v2 member. Consequently, the only way for a compiler to make all of the indicated rules work would be to say that converting a pointer of one structure type into a pointer of another type and using that pointer to inspect members of the Common Initial Sequence would work.
Unfortunately, compiler writers have said that the CIS rule is only applicable in cases where the underlying object has a union type, notwithstanding the fact that such a thing represents a very rare usage case (compared with situations where the union type exists for the purpose of letting the compiler know that pointers to the structures should be treated interchangeably for purposes of inspecting the CIS), and further since it would be rare for code to receive a struct s1* or struct s2* that identifies an object within a union u, they think they should be allowed to ignore that possibility. Thus, even if the above declarations are visible, gcc will assume that a struct s1* will never be used to access members of the CIS from a struct s2*.
By using pointers you can always create references to base classes at any level in the hierarchy. And if you use some kind of description of the inheritance structure, you can generate both the "class definitions" and factory functions needed as a build step.
#include <stdio.h>
#include <stdlib.h>
struct foo_class {
int a;
int b;
};
struct bar_class {
struct foo_class foo;
struct foo_class* base;
int c;
int d;
};
struct gazonk_class {
struct bar_class bar;
struct bar_class* base;
struct foo_class* Foo;
int e;
int f;
};
struct gazonk_class* gazonk_factory() {
struct gazonk_class* new_instance = malloc(sizeof(struct gazonk_class));
new_instance->bar.base = &new_instance->bar.foo;
new_instance->base = &new_instance->bar;
new_instance->Foo = &new_instance->bar.foo;
return new_instance;
}
int main(int argc, char* argv[]) {
struct gazonk_class* object = gazonk_factory();
object->Foo->a = 1;
object->Foo->b = 2;
object->base->c = 3;
object->base->d = 4;
object->e = 5;
object->f = 6;
fprintf(stdout, "%d %d %d %d %d %d\n",
object->base->base->a,
object->base->base->b,
object->base->c,
object->base->d,
object->e,
object->f);
return 0;
}
In this example you can either use base pointers to work your way back or directly reference a base class.
The address of a struct is the address of its first element, guaranteed.
C11 adds, among other things, 'Anonymous Structs and Unions'.
I poked around but could not find a clear explanation of when anonymous structs and unions would be useful. I ask because I don't completely understand what they are. I get that they are structs or unions without the name afterwards, but I have always (had to?) treat that as an error so I can only conceive a use for named structs.
Anonymous union inside structures are very useful in practice. Consider that you want to implement a discriminated sum type (or tagged union), an aggregate with a boolean and either a float or a char* (i.e. a string), depending upon the boolean flag. With C11 you should be able to code
typedef struct {
bool is_float;
union {
float f;
char* s;
};
} mychoice_t;
double as_float(mychoice_t* ch)
{
if (ch->is_float) return ch->f;
else return atof(ch->s);
}
With C99, you'll have to name the union, and code ch->u.f and ch->u.s which is less readable and more verbose.
Another way to implement some tagged union type is to use casts. The Ocaml runtime gives a lot of examples.
The SBCL implementation of Common Lisp does use some union to implement tagged union types. And GNU make also uses them.
A typical and real world use of anonymous structs and unions are to provide an alternative view to data. For example when implementing a 3D point type:
typedef struct {
union{
struct{
double x;
double y;
double z;
};
double raw[3];
};
}vec3d_t;
vec3d_t v;
v.x = 4.0;
v.raw[1] = 3.0; // Equivalent to v.y = 3.0
v.z = 2.0;
This is useful if you interface to code that expects a 3D vector as a pointer to three doubles. Instead of doing f(&v.x) which is ugly, you can do f(v.raw) which makes your intent clear.
struct bla {
struct { int a; int b; };
int c;
};
the type struct bla has a member of a C11 anonymous structure type.
struct { int a; int b; } has no tag and the object has no name: it is an anonymous structure type.
You can access the members of the anonymous structure this way:
struct bla myobject;
myobject.a = 1; // a is a member of the anonymous structure inside struct bla
myobject.b = 2; // same for b
myobject.c = 3; // c is a member of the structure struct bla
Another useful implementation is when you are dealing with rgba colors, since you might want access each color on its own or as a single int.
typedef struct {
union{
struct {uint8_t a, b, g, r;};
uint32_t val;
};
}Color;
Now you can access the individual rgba values or the entire value, with its highest byte being r. i.e:
int main(void)
{
Color x;
x.r = 0x11;
x.g = 0xAA;
x.b = 0xCC;
x.a = 0xFF;
printf("%X\n", x.val);
return 0;
}
Prints 11AACCFF
I'm not sure why C11 allows anonymous structures inside structures. But Linux uses it with a certain language extension:
/**
* struct blk_mq_ctx - State for a software queue facing the submitting CPUs
*/
struct blk_mq_ctx {
struct {
spinlock_t lock;
struct list_head rq_lists[HCTX_MAX_TYPES];
} ____cacheline_aligned_in_smp;
/* ... other fields without explicit alignment annotations ... */
} ____cacheline_aligned_in_smp;
I'm not sure if that example strictly necessary, except to make the intent clear.
EDIT: I found another similar pattern which is more clear-cut. The anonymous struct feature is used with this attribute:
#if defined(RANDSTRUCT_PLUGIN) && !defined(__CHECKER__)
#define __randomize_layout __attribute__((randomize_layout))
#define __no_randomize_layout __attribute__((no_randomize_layout))
/* This anon struct can add padding, so only enable it under randstruct. */
#define randomized_struct_fields_start struct {
#define randomized_struct_fields_end } __randomize_layout;
#endif
I.e. a language extension / compiler plugin to randomize field order (ASLR-style exploit "hardening"):
struct kiocb {
struct file *ki_filp;
/* The 'ki_filp' pointer is shared in a union for aio */
randomized_struct_fields_start
loff_t ki_pos;
void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
void *private;
int ki_flags;
u16 ki_hint;
u16 ki_ioprio; /* See linux/ioprio.h */
unsigned int ki_cookie; /* for ->iopoll */
randomized_struct_fields_end
};
Well, if you declare variables from that struct only once in your code, why does it need a name?
struct {
int a;
struct {
int b;
int c;
} d;
} e,f;
And you can now write things like e.a,f.d.b,etc.
(I added the inner struct, because I think that this is one of the most usages of anonymous structs)