Basically, what I want is some kind of compile-time generated version that is associated with the exact definition of a struct. If the definition of the struct changes in any way (field added, moved, maybe renamed), I want that version to change, too.
Such a version constant would be useful when reading in a previously serialized struct, to make sure that it's still compatible. The alternative would be manually keeping track of a manually specified constant, which has potentially confusing effects if incrementing it is forgotten (deserializing produces garbage), and also raises the question when exactly to increment it (during development and testing, or only during some kind of release).
This could be achieved by using an external tool to generate a hash over the struct definition, but I'm wondering if it is possible with the C compiler (and/or maybe its preprocessor) itself.
This is actually some form of introspection and so I suspect that this may not be possible at all in ANSI C, but I would be happy with a solution that works with gcc and clang.
The Windows API used to (still does?) have a size member as one of the first members of a struct, so that it knew what version of the struct it was being passed (see WNDCLASSEX as an example):
struct Foo
{
size_t size;
char *bar;
char *baz;
/* Other fields */
};
And before calling you set the size using sizeof:
struct Foo f;
f.size = sizeof(struct Foo);
f.bar = strdup("hi");
f.baz = strdup("there");
somefunc(&f);
Then somefunc would know, based on the size member, which version of the struct it was dealing with. Because sizeof is evaluated at compile time instead of run-time, this allows for backwards ABI compatibility.
There is nothing that would do it automatically, but you can build something that works reasonably reliably: you can use sizeof and offsetof, and combine them in such a way that the order in which you combine them mattered. Here is an example:
#include <stdio.h>
#include <stddef.h>
#define COMBINE2(a,b) ((a)*31+(b)*11)
#define COMBINE3(a,b,c) COMBINE2(COMBINE2(a,b),c)
#define COMBINE4(a,b,c,d) COMBINE2(COMBINE3(a,b,c),d)
typedef struct A {
int a1;
char a2;
float a3;
} A;
typedef struct B {
int b1;
char b2;
double b3;
} B;
typedef struct C {
char c2;
int c1;
float c3;
} C;
typedef struct D {
int d1;
char d2;
float d3;
int forgotten[2];
} D;
int main(void) {
size_t aSign = COMBINE4(sizeof(A), offsetof(A,a1), offsetof(A,a2), offsetof(A,a3));
size_t bSign = COMBINE4(sizeof(B), offsetof(B,b1), offsetof(B,b2), offsetof(B,b3));
size_t cSign = COMBINE4(sizeof(C), offsetof(C,c1), offsetof(C,c2), offsetof(C,c3));
size_t dSign = COMBINE4(sizeof(D), offsetof(D,d1), offsetof(D,d2), offsetof(D,d3));
printf("%ld %ld %ld %ld", aSign, bSign, cSign, dSign);
return 0;
}
This code prints
358944 478108 399864 597272
As you can see, this code produces run-time constants for each structure that reacts to re-ordering of fields of different lengths and changing fields' types. It also reacts to adding fields even if you forget to update the list of fields on which you base your computation, which should produce some sort of a safety net.
Related
Is there a way of knowing the type of a struct member at compile time? Something analogous to offsetof(), but for types.
Eg. something like:
typedef struct{
int b;
char c;
}a_t;
typeof(a_t,b) a_get_b(void* data){
return *(typeof(a_t,b)*)(data + offsetof(a_t,b));
}
If you're willing to use typeof (which is currently a very common nonstandard C extension slated for inclusion in the next version of the standard), you can apply it to a member obtained from a compound literal as in typeof((a_t){0}.b):
typedef struct{ int b; char c; }a_t;
typeof((a_t){0}.b) a_get_b(void* data){ return (a_t*){data}->b; }
(Given a type a_t, (a_t){0} is a reliable way to get an instance of it. Because of how initialization works in C, the 0 will initialize a deepest first elementary member and elementary types are scalars and therefore 0-initializable.)
As for the obtaining the member from a void pointer pointing to the container, you could do:
*(typeof(&(a_t){0}.b)((char*)data + offsetof(a_t,b))
but that's just an awfully long-winded way to do:
(a_t*){data}->b
(which is 100% equivalent to the former as long as the effective type of data is indeed a_t*).
Another way it works:
#include <stdio.h>
#define typeof_element(_struct,el) typeof(((_struct *)(0))->el)
typedef struct{
int a;
int b;
}Row;
int main()
{
typeof_element(Row, a) value_a = 10;
printf("%d\n", value_a);
return 0;
}
Another way (other than Jerry Jeremiah's) is:
#define struct_get(STRUCT,ELEM) *(typeof(STRUCT.ELEM)*) (STRUCT+offsetof(typeof(STRUCT),ELEM))
I have defined two data structures that must remain the same size as each other for the application to function properly. The struct's are used to communicate between a PC and a DSP. The DSP code is in 'C', the PC side in C++.
for example:
struct inbound_data{
int header[5];
float val1;
float val2;
int trailer[3];
};
struct outbound_data{
int header[5];
int reply1;
int reply2;
float dat1;
float dat2;
int filler[1];
}
later I will do something like:
int tx_block[sizeof(outbound_data)];
int rx_block[sizeof(inbound_data)];
These arrays will be passed to the communication peripherals to transmit and receive between the devices.
Because of how the hardware works, it is essential that the size of the two structs match, so that the buffers are of equal size. This is easy enough to assure with proper care, but occasionally through the design cycle, the data structures get modified. If one is not extremely careful, and aware of the requirement that the structures stay the same size (and be reflected in the PC side code as well), chaos ensues.
I would like to find a compile time way to have the code not build if one of the structures gets modified so that it does not match the size of the other structure.
Is this possible somehow in 'standard' C to check the sizes at compile time and fail if they are different? (I think my compiler is at least C99, maybe not 11).
If you must use C99, then I too like Swordfish would suggest a macro. The way to make one which can appear anywhere, and would not introduce any objects for the optimizer to remove, is to put the invalid array in a typedef. So a more general purpose static assertion would look this:
#define CONCAT_(A,B) A##B
#define CONCAT(A,B) CONCAT_(A,B)
#define MY_STATIC_ASSERT(p, msg) typedef char CONCAT(dummy__,__LINE__) [(p) ? 1 : -1]
It's designed to mimic _Static_assert. The message is passed in with the hopes of a compiler diagnostic showing it. An example for its usage is here.
Which produces:
main.cpp:4:54: error: size of array 'dummy__13' is negative
#define MY_STATIC_ASSERT(p, msg) typedef char CONCAT(dummy__,__LINE__) [(p) ? 1 : -1]
^~~~~~~
main.cpp:2:22: note: in definition of macro 'CONCAT_'
#define CONCAT_(A,B) A##B
^
main.cpp:4:47: note: in expansion of macro 'CONCAT'
#define MY_STATIC_ASSERT(p, msg) typedef char CONCAT(dummy__,__LINE__) [(p) ? 1 : -1]
^~~~~~
main.cpp:13:1: note: in expansion of macro 'MY_STATIC_ASSERT'
MY_STATIC_ASSERT(sizeof(struct foo) == sizeof(struct baz), "Do not match!");
And all the way down there you can see the static assertion with the message.
As an afterthought, you can change dummy__ to please_check_line_ will will produce the more descriptive please_check_line_13 above.
The C11 Standard added a new keyword _Static_assert. You can use it to test a predicate at compile-time, and produce an error if it is false:
_Static_assert(sizeof(outbound_data) == sizeof(inbound_data), "sizes must match");
Enforce two structs have same size at compile time?
There is no standard way to enforce this in C. There are only ways to protect it from happening, such as static_assert - which prevents buggy code from compiling but doesn't solve the actual problem.
In your case there are several problems:
Your struct is using the naive default types of C. These aren't portable and can have any size. This can easily be fixed by swapping int for int32_t etc.
Endianess might make the code non-portable regardless of integer type. That's a separate issue that I won't address here, but it needs to be considered, especially for exotic DSPs.
Any struct can contain padding bytes anywhere, to sate system-specific alignment requirements. The root of the problem being that alignment works differently on different systems. This is the hard one to solve.
The dirty fix to avoid padding is to use static_assert together with some non-standard solution to ensure that the struct has the expected size. Such as #pragma pack(1) or gcc __attribute__ ((__packed__)) etc. These are not standard nor are they portable. Furthermore, skipping padding can be problematic on many systems and you can get issues with misaligned access - padding is there for a reason. So this can potentially create more problems than it solves.
So unfortunately we end up with the realisation that struct is unsuitable for portable code. Particularly for things like data protocol specifications.
If you need truly portable, rugged code, it leaves you with only one option, namely to use a raw data array of uint8_t. In case you need to translate this array into structs, you will have to write serialization/de-serialization code. Which will cost run-time overhead. But there's no other way around it, if you want truly portable structs.
For C99 you could use something like
#define C_ASSERT(x, y) { int dummy[(x) == (y) ? 1 : -1]; (void*)dummy; }
struct foo {
int f;
};
struct bar {
int b1;
//int b2;
};
int main()
{
C_ASSERT(sizeof(struct foo), sizeof(struct bar));
}
you can add padding to equalize the size
struct inbound_data;
struct outbound_data;
struct _inbound_data{
int header[5];
float val1;
float val2;
int trailer[3];
};
struct _outbound_data{
int header[5];
int reply1;
int reply2;
float dat1;
float dat2;
int filler[1];
};
struct inbound_data{
int header[5];
float val1;
float val2;
int trailer[3];
char padding[sizeof(struct _inbound_data) < sizeof(struct _outbound_data) ? sizeof(struct _outbound_data) - sizeof(struct _inbound_data) : 0];
};
struct outbound_data{
int header[5];
int reply1;
int reply2;
float dat1;
float dat2;
int filler[1];
char padding[sizeof(struct _outbound_data) < sizeof(struct _inbound_data) ? sizeof(struct _inbound_data) - sizeof(struct _outbound_data) : 0];
};
I it can be of course written shorter way without the struct members duplication - but I did it intentionally to show the idea.
struct inbound_data1 __attribute__((packed){
struct _inbound_data id;
char padding[sizeof(struct _inbound_data) < sizeof(struct _outbound_data) ? sizeof(struct _outbound_data) - sizeof(struct _inbound_data) : 0];
};
struct outbound_data1 __attribute__((packed){
struct _outbound_data od;
char padding[sizeof(struct _outbound_data) < sizeof(struct _inbound_data) ? sizeof(struct _inbound_data) - sizeof(struct _outbound_data) : 0];
};
Suppose a C library has to share the details of a structure with the application code and has to maintain API and ABI backward compatibility. It tries to do this by checking the size of the structure passed to it.
Say, the following structure needs to be updated. In library version 1,
typedef struct {
int size;
char* x;
int y;
} foo;
In version 2 of the library, it is updated to:
typedef struct {
int size;
char* x;
int y;
int z;
} foo_2;
Now, library version 2 wants to check if the application is passing the new foo_2 or the old foo as an argument, arg, to a function. It assumes that the application has set arg.size to sizeof(foo) or sizeof(foo_2) and attempts to figure out whether the application code groks version 2.
if(arg.size == sizeof(foo_2)) {
// The application groks version 2 of the library. So, arg.z is valid.
} else {
// The application uses of version 1 of the library. arg.z is not valid.
}
I'm wondering why this won't fail. On GCC 4.6.3, with -O3 flag, both sizeof(foo) and sizeof(foo_2) are 24. So, won't v2 library code fail to understand if the application is passing a struct of type foo or foo_2? If yes, how come this approach seems to have been used?
http://wezfurlong.org/blog/2006/dec/coding-for-coders-api-and-abi-considerations-in-an-evolving-code-base/
http://blogs.msdn.com/b/oldnewthing/archive/2003/12/12/56061.aspx
Follow on question: Is there a good reason to favor the use of sizeof(struct) for version discrimination? As pointed out in the comments, why not use an explicit version member in the shared struct?
In order to match your observations, I posit
char* has size 8 and alignment 8.
int has size 4 and alignment 4.
Your implementation uses optimal packing.
You are quite right that in that case, both your old and new structure would have the same size, and as your version-discriminator is the structures size, the upgrade is an ABI-breaking change. (Few logic-errors are also syntax-errors, and the former are not diagnosed by a compiler).
Only changes to the structure which result in a bigger size, with the new struct containing all the fields of the old one at the same offsets, can be ABI-compatible under that scheme: Add some dummy variables.
There is one possibility which might save the day though:
If a field contains a value which was previously invalid, that might indicate that anything else might have to be interpreted differencty.
If you want to use this scheme to distinguish different versions of your API you simply have to make sure that the different struct versions have different sizes.
To do so, you can either try to make foo smaller by forcing the compiler to use tighter packing, or you can make foo_2 larger by adding additional (unused) fields.
In any way, you should add an assertion (preferably at compile time) for sizeof(foo) != sizeof(foo_2) to make sure the structs always actually have different sizes.
I suggest the use of an intermediate structure.
For example:
typedef struct
{
int version;
void* data;
} foo_interface;
typedef struct
{
char* x;
int y;
} foo;
typedef struct
{
char* x;
int y;
int z;
} foo_2;
In my library version 2, I would export by name the following function:
foo_interface* getFooObject()
{
foo_interface* objectWrapper = malloc(sizeof(foo_interface));
foo_2* realObject = malloc(sizeof(foo_2));
/* Fill foo_2 with random data... */
realObject.x = malloc(1 * sizeof(char));
realObject.y = 2;
realObject.z = 3;
/* Fill our interface. */
objectWrapper.version = 2; /* Here we specify version 2. */
objectWrapper.data = (void*)realObject;
/* Return our wrapped data. */
return (objectWrapper);
}
Then in the main application I would do:
int main(int ac, char **av)
{
/* Load library + Retrieve getFooObject() function here. */
foo_interface* objectWrapper = myLibrary.getFooObject();
switch (objectWrapper->version)
{
case 1:
foo* realObject = (foo*)(objectWrapper ->data);
/* Do something with foo here. */
break;
case 2:
foo_2* realObject = (foo_2*)(objectWrapper ->data);
/* Do something with foo_2 here. */
break;
default:
printf("Unknown foo version!");
break;
}
return (0);
}
As usual, security checks (when allocating memory for example) are not included for readability of the code.
Also, I would use stdint.h to ensure data types binary compatibility (to be sure the sizes of int, double, char* and so on are the same across different architectures). For example, instead of int I would use int32_t.
I'm taking over a piece of code..c programming in linux. I did a small change to a struct
typedef struct {
unsigned int a1;
..
..
..
float f1;
unsigned int a2;
unsigned int a3;
unsigned int offending; // shifted this
} test;
I shifted unsigned int offending to before float f1, like this:
typedef struct {
unsigned int a1;
..
..
..
unsigned int offending;
float f1;
unsigned int a2;
unsigned int a3;
} test;
and the code crashes... what could be the problem?
Is the order of members of a c struct important?
What could be the problem? Depend on the rest of the code, and what else you did.
No, the order of members of a struct is not intrinsically important. It is made so when other code depends on it.
Possible causes (not exhaustive):
You didn't recompile everything and there is external linkage on this struct or some aspect of it.
By moving the member you changed the alignment of other members and/or the sizeof() the struct, and didn't compensate for that.
There is a literal constant or macro somewhere with a size or offset that depends on this struct.
There is faulty code which never failed before but does now because of a change in memory layout.
The struct is used somewhere as part of another struct or union, and the problem is related to that.
There is a list initialisation using {} that no longer matches the member order.
You really should provide details of how it crashes. Otherwise it guesswork. And perhaps even then.
edit: ht #Jens.
The most probable reason for crashes if you change data layout is initialization. If you have old-time initializers in your code that use declaration order, all of a sudden the fields will receive different values than before. Therefore modern C since C99 has designated initializers that avoid that problem:
test toto = { 32, ... , 42, }; // sensible to reordering
test tata = { .a1 = 32, ... , .offending = 42, }; // still the same
I read about __packed__ from here and, I understood that when __packed__ is used in a struct or union, it means that the member variables are placed in such a way to minimize the memory required to store the struct or union.
Now, consider the structures in the following code. They contain same elements (same type, same variable names and placed in the same order). The difference is, one is __packed__ and the other is not.
#include <stdio.h>
int main(void)
{
typedef struct unpacked_struct {
char c;
int i;
float f;
double d;
}ups;
typedef struct __attribute__ ((__packed__)) packed_struct {
char c;
int i;
float f;
double d;
}ps;
printf("sizeof(my_unpacked_struct) : %d \n", sizeof(ups));
printf("sizeof(my_packed_struct) : %d \n", sizeof(ps));
ups ups1 = init_ups();
ps ps1;
return 0;
}
Is there a way where we can copy unpacked structure ups1 into packed structure ps1 without doing a member-variable-wise-copy? Is there something like memcpy() that is applicable here?
I'm afraid you've just gotta write it out. Nothing in standard C (or any standard I know of) will do this for you. Write it once and never think about it again.
ps ups_to_ps(ups ups) {
return (ps) {
.c = ups.c,
.i = ups.i,
.f = ups.f,
.d = ups.d,
};
}
Without detailed knowlegde of the differences of the memory layout of the two structures: No.