How to automatically pad bytes in a structure to a specific alignment? - c

For example, I have the following structure:
typedef struct st
{
char x1;
char x2;
...
char y;
}*st_p;
This definition of structure st may vary over time (it is not guaranteed what will be defined inside ... as development goes on).
Is there a way to guarantee that char y is always aligned on – for example – a 256-byte boundary?
I guess the solution is something like adding an array like the following:
typedef struct st
{
char x1;
char x2;
...
char padding[keyword];
char y;
}*st_p;
Is there such a keyword which calculates the total size of all previous members in this structure and which thereby helps resolve the number of bytes to pad?

Since C11, the C language includes the _Alignas attribute,1 with which you can specify a desired (minimum) alignment requirement for any object, including a member of a structure.
So, in your case, you can skip the 'manual' padding and just add that attribute to the char y; declaration:
typedef struct st {
char x1;
char x2;
//...
_Alignas(256) char y;
}*st_p;
1 However, note that the C Standard does not require that an implementation supports so-called "extended alignment" – that is, an alignment requirement that is greater than that of a max_align_t type (typically 8 or 16 bytes). What it does require, though, is that the specified alignment be a positive power of 2 (which is true, in your case).
As an addendum/alternative, I shall also address your direct question:
Is there such a keyword which calculates the total size of all previous members in this structure and which thereby helps resolve the
number of bytes to pad?
The 'keyword' (actually, an operator) that could be used here is sizeof – so long as you put "all previous members" inside an anonymous struct. Here's an example:
#include <stdio.h>
#include <stddef.h>
typedef struct st {
struct inner {
char x1;
char x2;
char x3[714];
//...
};
char padding[256 - sizeof(struct inner) % 256];
char y;
}*st_p;
int main()
{
struct st test;
test.x1 = 'A'; // You can access the anonymous structure members as though
test.x2 = 'B'; // they were 'direct' members of the enclosing structure.
printf("%c %c %zu\n", test.x1, test.x2, offsetof(struct st, y));// Offset is 768
return 0;
}
But note that, in this code, although the offset of y will be a multiple of 256 bytes, that doesn't guarantee that y will be 256-byte aligned. That will only be true in an instance of the st structure that is itself aligned on a 256-byte boundary … and to ensure that, you will need an alignment attribute on the structure itself.

It is possible to use the compiler attributes to tell it to align a particular structure field to a specific bound. For example, if you are using GCC, you can use:
__attribute__((aligned(n)))
Thus, your example would become:
typedef struct st
{
char x1;
char x2;
...
char __attribute__((aligned(256))) y;
}*st_p;
If you have to do this at several places, you can even imagine defining a special type for this, something like:
typedef char __attribute__((aligned(256))) aligned_char;
typedef struct st
{
char x1;
char x2;
...
aligned_char y;
}*st_p;
This is not limited to structures field, you can also apply this to local or global variables.
However, keep in mind this may not work with all compilers as the attributes are binded to the compiler and not the C language.
If you are using MSCV, you may be able to replicate the behavior with
__declspec(align(x))

Could use a union, but there is no .padding[] member. Could also use a check that the offset is correct in case the first struct is larger than desired offset.
#define OFFSET 256 /* Does not need to be a power-of-2 */
typedef struct st {
union {
struct {
char x1;
char x2;
};
struct {
char dummy[OFFSET];
};
};
char y;
} st;
void bar(void) {
st o1;
o1.x1 = '1';
o1.x2 = '2';
o1.y = 'y';
printf("%p\n", (void *) &o1);
printf("%p\n", (void *) &o1.y);
}
_Static_assert(offsetof(struct st, y) == OFFSET, "TBD code");
Sample output
0xffffcac0
0xffffcbc0
This does not insure the alignment is 256, but that the offset of member y is 256 - which I think is OP's true goal. #Adrian Mole

Related

struct similarity in C

Consider the two structs below:
struct A {
double x[3];
double y[3];
int z[3];
struct A *a;
int b;
struct A *c;
unsigned d[10];
};
struct B {
double x[3];
double y[3];
int z[3];
};
Notice that struct B is a strict subset of struct A. Now, I want to copy the members .x, .y and .z from an instance of struct A to an instance of struct B. My question is: according to the standards, is it valid to do:
struct A s_a = ...;
struct B s_b;
memcpy(&s_b, &s_a, sizeof s_b);
I.e. is it guaranteed that the paddings for the members, in their sequence of appearance, will be the same, so that I can "partially" memcpy struct A to struct B?
It is not guaranteed that struct A's layout starts off the same as struct B's layout.
However, if and only if they were both members of a union:
union X
{
struct A a;
struct B b;
};
then it is guaranteed that the common initial sequence has the same layout.
I've never heard of any compiler that would lay out a struct differently if it detected that the struct were a member of a union, so in practice you should be safe!
How about using struct B as an anonymous struct member of struct A. This requires, however, -fms-extensions for gcc (there should be a similar extension for VC as the name implies):
struct B {
double x[3];
double y[3];
int z[3];
};
struct A {
struct B;
struct A *a;
int b;
struct A *c;
unsigned d[10];
};
This allows to use the fields in struct A like:
struct A as;
as.x[2] = as.y[0];
etc. This guarantees identical layout (the standard allows no padding at the beginning of a struct, so the inner struct is guarantee to start at the same address as the outer) and struct A being cast-compatible to struct B.
Also:
struct A as;
struct B bs;
memcpy(&as, &bs, sizeof(bs));
I do not think the Standard would prohibit an implementation from including so much more padding in s_a than s_b that the former is actually larger even though its members are a subset of s_b's. Such behavior would be very weird, and I can't think of any reason why a compiler would do such a thing, but I don't think it would be prohibited.
If the number of bytes copied is the lesser of sizeof s_a and sizeof s_b, then the memcpy operation will be guaranteed to copy all of the common fields, but would not necessarily leave the later fields of s_b undisturbed. On a typical machine, if the declarations had been:
struct A { uint32_t x; char y; };
struct B { uint32_t x; char y,p; uint16_t q; };
the first structure would contain five bytes of data and three bytes of padding, while the second would contain eight bytes of data with no padding. Using memcpy as shown in your code would copy the padding from s_a over the data in s_b.
If you need to copy the initial structure members while leaving the balance of the structure undisturbed, you should compute add offset and size of the last member of interest, and use that as the number of bytes to copy. In the example I give above, the offset of y would be 4, and the size would be 1, so the memcpy would thus ignore parts of the structure that are used as padding in A but might hold data in B.

Using macro in C11 anonymous struct definition

The typical C99 way to extending stuct is something like
struct Base {
int x;
/* ... */
};
struct Derived {
struct Base base_part;
int y;
/* ... */
};
Then we may cast instance of struct Derived * to struct Base * and then access x.
I want to access base elements of struct Derived * obj; directly, for example obj->x and obj->y. C11 provide extended structs, but as explained here we can use this feature only with anonymous definitions. Then how about to write
#define BASE_BODY { \
int x; \
}
struct Base BASE_BODY;
struct Derived {
struct BASE_BODY;
int y;
};
Then I may access Base members same as it's part of Derived without any casts or intermediate members. I can cast Derived pointer to Base pointer if need do.
Is this acceptable? Are there any pitfalls?
There are pitfalls.
Consider:
#define BASE_BODY { \
double a; \
short b; \
}
struct Base BASE_BODY;
struct Derived {
struct BASE_BODY;
short c;
};
On some implementation it could be that sizeof(Base) == sizeof(Derived), but:
struct Base {
double a;
// Padding here
short b;
}
struct Derived {
double a;
short b;
short c;
};
There is no guarantee that at the beginning of the struct memory layout is the same. Therefore you cannot pass this kind of Derived * to function expecting Base *, and expect it to work.
And even if padding would not mess up the layout, there is a still potential problem with trap presenstation:
If again sizeof(Base) == sizeof(Derived), but c ends up to a area which is covered by the padding at the end of Base. Passing pointer of this struct to function which expects Base* and modifies it, might affect padding bits too (padding has unspecified value), thus possibly corrupting c and maybe even creating trap presentation.

C inheritance through type punning, without containment?

I'm in a position where I need to get some object oriented features working in C, in particular inheritance. Luckily there are some good references on stack overflow, notably this Semi-inheritance in C: How does this snippet work? and this Object-orientation in C. The the idea is to contain an instance of the base class within the derived class and typecast it, like so:
struct base {
int x;
int y;
};
struct derived {
struct base super;
int z;
};
struct derived d;
d.super.x = 1;
d.super.y = 2;
d.z = 3;
struct base b = (struct base *)&d;
This is great, but it becomes cumbersome with deep inheritance trees - I'll have chains of about 5-6 "classes" and I'd really rather not type derived.super.super.super.super.super.super all the time. What I was hoping was that I could typecast to a struct of the first n elements, like this:
struct base {
int x;
int y;
};
struct derived {
int x;
int y;
int z;
};
struct derived d;
d.x = 1;
d.y = 2;
d.z = 3;
struct base b = (struct base *)&d;
I've tested this on the C compiler that comes with Visual Studio 2012 and it works, but I have no idea if the C standard actually guarantees it. Is there anyone that might know for sure if this is ok? I don't want to write mountains of code only to discover it's broken at such a fundamental level.
What you describe here is a construct that was fully portable and would have been essentially guaranteed to work by the design of the language, except that the authors of the Standard didn't think it was necessary to explicitly mandate that compilers support things that should obviously work. C89 specified the Common Initial Sequence rule for unions, rather than pointers to structures, because given:
struct s1 {int x; int y; ... other stuff... };
struct s2 {int x; int y; ... other stuff... };
union u { struct s1 v1; struct s2 v2; };
code which received a struct s1* to an outside object that was either
a union u* or a malloc'ed object could legally cast it to a union u*
if it was aligned for that type, and it could legally cast the resulting
pointer to struct s2*, and the effect of using accessing either struct s1* or struct s2* would have to be the same as accessing the union via either the v1 or v2 member. Consequently, the only way for a compiler to make all of the indicated rules work would be to say that converting a pointer of one structure type into a pointer of another type and using that pointer to inspect members of the Common Initial Sequence would work.
Unfortunately, compiler writers have said that the CIS rule is only applicable in cases where the underlying object has a union type, notwithstanding the fact that such a thing represents a very rare usage case (compared with situations where the union type exists for the purpose of letting the compiler know that pointers to the structures should be treated interchangeably for purposes of inspecting the CIS), and further since it would be rare for code to receive a struct s1* or struct s2* that identifies an object within a union u, they think they should be allowed to ignore that possibility. Thus, even if the above declarations are visible, gcc will assume that a struct s1* will never be used to access members of the CIS from a struct s2*.
By using pointers you can always create references to base classes at any level in the hierarchy. And if you use some kind of description of the inheritance structure, you can generate both the "class definitions" and factory functions needed as a build step.
#include <stdio.h>
#include <stdlib.h>
struct foo_class {
int a;
int b;
};
struct bar_class {
struct foo_class foo;
struct foo_class* base;
int c;
int d;
};
struct gazonk_class {
struct bar_class bar;
struct bar_class* base;
struct foo_class* Foo;
int e;
int f;
};
struct gazonk_class* gazonk_factory() {
struct gazonk_class* new_instance = malloc(sizeof(struct gazonk_class));
new_instance->bar.base = &new_instance->bar.foo;
new_instance->base = &new_instance->bar;
new_instance->Foo = &new_instance->bar.foo;
return new_instance;
}
int main(int argc, char* argv[]) {
struct gazonk_class* object = gazonk_factory();
object->Foo->a = 1;
object->Foo->b = 2;
object->base->c = 3;
object->base->d = 4;
object->e = 5;
object->f = 6;
fprintf(stdout, "%d %d %d %d %d %d\n",
object->base->base->a,
object->base->base->b,
object->base->c,
object->base->d,
object->e,
object->f);
return 0;
}
In this example you can either use base pointers to work your way back or directly reference a base class.
The address of a struct is the address of its first element, guaranteed.

How to access members through a void pointer

Started by trying to write a small program to translate basic arithmetic into English, I end up building a binary tree(which is inevitably very unbalanced) to represent the order of evaluations.
First, I wrote
struct expr;
typedef struct{
unsigned char entity_flag; /*positive when the oprd
struct represents an entity
---a single digit or a parenthesized block*/
char oprt;
expr * l_oprd;// these two point to the children nodes
expr * r_oprd;
} expr;
However, to efficiently represent single digits, I prefer
typedef struct{
unsigned char entity_flag;
int ival;
} digit;
Since now the "oprd" feild of each "expr" struct may be either of the above struct-s, I now shall modify their types to
void * l_oprd;
void * r_oprd;
Then there comes the "central question":
how can you access members through a void pointer?
please see the follow code
#include<stdio.h>
#include<stdlib.h>
typedef struct {
int i1;
int i2;} s;
main(){
void* p=malloc(sizeof(s));
//p->i1=1;
//p->i2=2;
*(int*)p=1;
*((int *)p+1)=2;
printf("s{i1:%d, i2: %d}\n",*(int*)p,*((int *)p+1));
}
The compiler wouldn't accept the commented version!
Do I have to do it with the cluttered approach above?
please help.
PS: as you have noticed ,each struct-s above possess a field of the name "entity_flag", thus
void * vp;
...(giving some value to vp)
unsigned char flag=vp->entity_flag;
may extract the flag regardless of what void points to, is this allowed in C? or even "safe" in C?
You can't access members through void * pointers. There are ways you could cast it (indeed, you don't even need to state the case explicitly with void *), but even that is the wrong answer.
The correct answer is to use union:
typedef union {
struct{
unsigned char entity_flag; /*positive when the oprd
struct represents an entity
---a single digit or a parenthesized block*/
char oprt;
expr * l_oprd;// these two point to the children nodes
expr * r_oprd;
} expr;
struct{
unsigned char entity_flag;
int ival;
} digit;
} expr;
You then access an expression like this (given a variable expr *e):
e->expr->entity_flag;
And a digit like this:
e->digit->entity_flag;
Any other solution is a nasty hack, IMO, and most of the casting solutions will risk breaking the "strict aliasing" rules that say that the compiler is allowed to assume that two pointers of different types can't reference the same memory.
Edit ...
If you need to be able to inspect the data itself in order to figure out which member of the union is in use, you can.
Basically, If the top-most fields in two structs are declared the same then they will have the same binary representation. This isn't just limited to unions, this is true in general across all binaries compiled for that architecture (if you think about it, this is essential for libraries to work).
In unions it is common to pull those out into a separate struct so that it's obvious what you're doing, although it's not required:
union {
struct {
int ID;
} base;
struct {
int ID;
char *data
} A;
struct {
int ID;
int *numeric_data;
} B;
}
In this scheme, p->base.ID, p->A.ID, p->B.ID are guaranteed to read the same.
Just convert p to the relevant pointer type:
s *a = p;
a->i1 = 42;
a->i2 = 31;
or
((s *) p)->i1 = 42;
((s *) p)->i2 = 31;
you could cast it:
((s*)p)->i1=1;
((s*)p)->i2=2;
I don't see any entity_flag in struct s but if you mean expr the same applies:
unsigned char flag=((expr*)vp)->entity_flag;
If you know the offset at where your struct member is you can do pointer arithmetic and then cast to the appropriate type according to the value of entity_flag.
I would strongly suggest to align both structures in bytes and use the same number of bytes for oprt and digit.
Also, if you only have oprt and digit "types" in your tree you could sacrifice the first bit of precision to flag for digit or oprt and save the space needed for unsigned char entity_flag. If you use a single 4 bytes int var for both oprt and digit and use the first bit to encode the type you can extract a digit by (using the union solution pattern: proposed in the thread)
typedef union {
struct {
int code;
expr * l_expr;
expr * r_expr;
} oprt;
struct {
int val;
} digit;
} expr;
expr *x;
int raw_digit = x->digit.val;
int digit = raw_digit | ((0x4000000 & raw_digit) << 1 ) // preserves sign in 2's complement
x->digit.val = digit | 0x8000000 // assuming MSB==1 means digit
Using union does not necessarily uses more memory for digits. Basically a digit only takes 4 bytes. So every time you need to alloc a digit type expr you can simply call malloc(4), cast the results to *expr, and set the MSB to 1 accordingly. If you encode and decode expr pointers without bugs, you'll never try to reach beyond the 4th bytes of a "digit" type expr ... hopefully. I don't recommend this solution if you need safety ^_^
To check for expr types easily, you can use bitfield inside the union I believe:
typedef union {
struct {
int code;
expr * l_expr;
expr * r_expr;
} oprt;
struct {
int val;
} digit;
struct {
unsigned int is_digit : 1;
int : 31; //unused
} type;
} expr;

When are anonymous structs and unions useful in C11?

C11 adds, among other things, 'Anonymous Structs and Unions'.
I poked around but could not find a clear explanation of when anonymous structs and unions would be useful. I ask because I don't completely understand what they are. I get that they are structs or unions without the name afterwards, but I have always (had to?) treat that as an error so I can only conceive a use for named structs.
Anonymous union inside structures are very useful in practice. Consider that you want to implement a discriminated sum type (or tagged union), an aggregate with a boolean and either a float or a char* (i.e. a string), depending upon the boolean flag. With C11 you should be able to code
typedef struct {
bool is_float;
union {
float f;
char* s;
};
} mychoice_t;
double as_float(mychoice_t* ch)
{
if (ch->is_float) return ch->f;
else return atof(ch->s);
}
With C99, you'll have to name the union, and code ch->u.f and ch->u.s which is less readable and more verbose.
Another way to implement some tagged union type is to use casts. The Ocaml runtime gives a lot of examples.
The SBCL implementation of Common Lisp does use some union to implement tagged union types. And GNU make also uses them.
A typical and real world use of anonymous structs and unions are to provide an alternative view to data. For example when implementing a 3D point type:
typedef struct {
union{
struct{
double x;
double y;
double z;
};
double raw[3];
};
}vec3d_t;
vec3d_t v;
v.x = 4.0;
v.raw[1] = 3.0; // Equivalent to v.y = 3.0
v.z = 2.0;
This is useful if you interface to code that expects a 3D vector as a pointer to three doubles. Instead of doing f(&v.x) which is ugly, you can do f(v.raw) which makes your intent clear.
struct bla {
struct { int a; int b; };
int c;
};
the type struct bla has a member of a C11 anonymous structure type.
struct { int a; int b; } has no tag and the object has no name: it is an anonymous structure type.
You can access the members of the anonymous structure this way:
struct bla myobject;
myobject.a = 1; // a is a member of the anonymous structure inside struct bla
myobject.b = 2; // same for b
myobject.c = 3; // c is a member of the structure struct bla
Another useful implementation is when you are dealing with rgba colors, since you might want access each color on its own or as a single int.
typedef struct {
union{
struct {uint8_t a, b, g, r;};
uint32_t val;
};
}Color;
Now you can access the individual rgba values or the entire value, with its highest byte being r. i.e:
int main(void)
{
Color x;
x.r = 0x11;
x.g = 0xAA;
x.b = 0xCC;
x.a = 0xFF;
printf("%X\n", x.val);
return 0;
}
Prints 11AACCFF
I'm not sure why C11 allows anonymous structures inside structures. But Linux uses it with a certain language extension:
/**
* struct blk_mq_ctx - State for a software queue facing the submitting CPUs
*/
struct blk_mq_ctx {
struct {
spinlock_t lock;
struct list_head rq_lists[HCTX_MAX_TYPES];
} ____cacheline_aligned_in_smp;
/* ... other fields without explicit alignment annotations ... */
} ____cacheline_aligned_in_smp;
I'm not sure if that example strictly necessary, except to make the intent clear.
EDIT: I found another similar pattern which is more clear-cut. The anonymous struct feature is used with this attribute:
#if defined(RANDSTRUCT_PLUGIN) && !defined(__CHECKER__)
#define __randomize_layout __attribute__((randomize_layout))
#define __no_randomize_layout __attribute__((no_randomize_layout))
/* This anon struct can add padding, so only enable it under randstruct. */
#define randomized_struct_fields_start struct {
#define randomized_struct_fields_end } __randomize_layout;
#endif
I.e. a language extension / compiler plugin to randomize field order (ASLR-style exploit "hardening"):
struct kiocb {
struct file *ki_filp;
/* The 'ki_filp' pointer is shared in a union for aio */
randomized_struct_fields_start
loff_t ki_pos;
void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
void *private;
int ki_flags;
u16 ki_hint;
u16 ki_ioprio; /* See linux/ioprio.h */
unsigned int ki_cookie; /* for ->iopoll */
randomized_struct_fields_end
};
Well, if you declare variables from that struct only once in your code, why does it need a name?
struct {
int a;
struct {
int b;
int c;
} d;
} e,f;
And you can now write things like e.a,f.d.b,etc.
(I added the inner struct, because I think that this is one of the most usages of anonymous structs)

Resources