How does sizeof(struct) help provide ABI compatibility? - c

Suppose a C library has to share the details of a structure with the application code and has to maintain API and ABI backward compatibility. It tries to do this by checking the size of the structure passed to it.
Say, the following structure needs to be updated. In library version 1,
typedef struct {
int size;
char* x;
int y;
} foo;
In version 2 of the library, it is updated to:
typedef struct {
int size;
char* x;
int y;
int z;
} foo_2;
Now, library version 2 wants to check if the application is passing the new foo_2 or the old foo as an argument, arg, to a function. It assumes that the application has set arg.size to sizeof(foo) or sizeof(foo_2) and attempts to figure out whether the application code groks version 2.
if(arg.size == sizeof(foo_2)) {
// The application groks version 2 of the library. So, arg.z is valid.
} else {
// The application uses of version 1 of the library. arg.z is not valid.
}
I'm wondering why this won't fail. On GCC 4.6.3, with -O3 flag, both sizeof(foo) and sizeof(foo_2) are 24. So, won't v2 library code fail to understand if the application is passing a struct of type foo or foo_2? If yes, how come this approach seems to have been used?
http://wezfurlong.org/blog/2006/dec/coding-for-coders-api-and-abi-considerations-in-an-evolving-code-base/
http://blogs.msdn.com/b/oldnewthing/archive/2003/12/12/56061.aspx
Follow on question: Is there a good reason to favor the use of sizeof(struct) for version discrimination? As pointed out in the comments, why not use an explicit version member in the shared struct?

In order to match your observations, I posit
char* has size 8 and alignment 8.
int has size 4 and alignment 4.
Your implementation uses optimal packing.
You are quite right that in that case, both your old and new structure would have the same size, and as your version-discriminator is the structures size, the upgrade is an ABI-breaking change. (Few logic-errors are also syntax-errors, and the former are not diagnosed by a compiler).
Only changes to the structure which result in a bigger size, with the new struct containing all the fields of the old one at the same offsets, can be ABI-compatible under that scheme: Add some dummy variables.
There is one possibility which might save the day though:
If a field contains a value which was previously invalid, that might indicate that anything else might have to be interpreted differencty.

If you want to use this scheme to distinguish different versions of your API you simply have to make sure that the different struct versions have different sizes.
To do so, you can either try to make foo smaller by forcing the compiler to use tighter packing, or you can make foo_2 larger by adding additional (unused) fields.
In any way, you should add an assertion (preferably at compile time) for sizeof(foo) != sizeof(foo_2) to make sure the structs always actually have different sizes.

I suggest the use of an intermediate structure.
For example:
typedef struct
{
int version;
void* data;
} foo_interface;
typedef struct
{
char* x;
int y;
} foo;
typedef struct
{
char* x;
int y;
int z;
} foo_2;
In my library version 2, I would export by name the following function:
foo_interface* getFooObject()
{
foo_interface* objectWrapper = malloc(sizeof(foo_interface));
foo_2* realObject = malloc(sizeof(foo_2));
/* Fill foo_2 with random data... */
realObject.x = malloc(1 * sizeof(char));
realObject.y = 2;
realObject.z = 3;
/* Fill our interface. */
objectWrapper.version = 2; /* Here we specify version 2. */
objectWrapper.data = (void*)realObject;
/* Return our wrapped data. */
return (objectWrapper);
}
Then in the main application I would do:
int main(int ac, char **av)
{
/* Load library + Retrieve getFooObject() function here. */
foo_interface* objectWrapper = myLibrary.getFooObject();
switch (objectWrapper->version)
{
case 1:
foo* realObject = (foo*)(objectWrapper ->data);
/* Do something with foo here. */
break;
case 2:
foo_2* realObject = (foo_2*)(objectWrapper ->data);
/* Do something with foo_2 here. */
break;
default:
printf("Unknown foo version!");
break;
}
return (0);
}
As usual, security checks (when allocating memory for example) are not included for readability of the code.
Also, I would use stdint.h to ensure data types binary compatibility (to be sure the sizes of int, double, char* and so on are the same across different architectures). For example, instead of int I would use int32_t.

Related

Is a struct copied when a stack-variable is initialized by a result of a function call?

Given I'll return a large struct in a function like here:
#include <stdio.h>
// this is a large struct
struct my_struct {
int x[64];
int y[64];
int z[64];
};
struct my_struct get_my_struct_from_file(const char *filename) {
int tmp1, tmp2; // some tmp. variables
struct my_struct u;
// ... load values from filename ...
return u;
}
int main() {
struct my_struct res = get_my_struct_from_file("tmp.txt"); // <-- here
printf("x[0] = %d\n", res.x[0]);
// ... print all values ...
}
At the place marked by here, do I have to assume that this large struct is copied or is it likely that the compiler does something to avoid this?
Thank you
… do I have to assume that this large struct is copied…
No, of course you do not have to make that assumption. Nobody requires you to make that assumption, and it would be unwise to adopt the statement as an assumption rather than deriving it from known information, such as compiler documentation or inspection of the generated assembly code.
In the specific code you show, it is likely good compilers will optimize so that the structure is not copied. (Testing with Apple Clang 11 confirms it does this optimization.) But that is likely overly simplified code. If a call to get_my_struct_from_file appears in a translation unit separate from its definition, the compiler will not know what get_my_struct_from_file is accessing. If the destination object, res in this example, has had its address previously passed to some other routine in some other translation unit, then the compiler cannot know that other routine did not stash the address somewhere and that get_my_struct_from_file is not using it. So the compiler would have to treat the structure returned by get_my_struct_from_file and the structure the return value is being assigned to as separate; it could not coalesce them to avoid the copy.
To ensure the compiler does what you want, simply tell it what you want it to do. Write the code so that the function puts the results directly in the structure you want to put it in:
void get_my_struct_from_file(struct my_struct *result, const char *filename)
{
…
}
...
get_my_struct_from_file(&res, "tmp.txt");
At the place marked by here, do I have to assume that this large struct is copied or is it likely that the compiler does something to avoid this?
Semantically, the structure is copied from the function's local variable to the caller's variable. These are distinct objects, and just like objects of other types, setting one structure equal to another requires copying from the representation of one to the representation of the other.
The only way to avoid a copy would be for the compiler to treat the local variable as an alias for the caller's structure, but that would be wrong in the general case. Such aliasing can easily produce observably different behavior than would occur without.
It is possible that in some specific cases, the compiler can indeed avoid the copy, but if you want to ensure that no copying happens then you should set up the wanted aliasing explicitly:
void get_my_struct_from_file(const char *filename, struct my_struct *u) {
int tmp1, tmp2; // some tmp. variables
// ... load values from filename into *u
}
int main() {
struct my_struct res = { 0 };
get_my_struct_from_file("tmp.txt", &res);
printf("x[0] = %d\n", res.x[0]);
// ... print all values ...
}

how to verify the type of a value passed to a function as void pointer?

Having the following piece of code:
File: types.h
typedef struct Struct_A_T
{
int A;
char B;
float C;
}Struct_A;
File: code.c
#include "types.h"
void Function(const void *const ptr)
{
Struct_A localStruct = *((Struct_A *)ptr);
localStruct.A = 1000;
localStruct.B = 250;
localStruct.C = 128.485;
}
File: main.c
#include "types.h"
void Function(const void *const ptr);
int main(void)
{
Struct_A MyStruct1 = {2, 5, 2.8};
float local = 24.785;
/* Correct call */
Function(&MyStruct1);
/* Incorrect call!!! */
Function(&local);
}
And knowing that a pointer to void can be used as a "generic" pointer. How can I detect inside "Function" that the type passed in the void pointer is the correct in order to avoid the run time error provoked by the last call in the file main.c?
There's no way to do it using language features. It can only be done manually.
I, for one, use the following technique in debug builds of the code
typedef struct Struct_A_T
{
int A;
char B;
float C;
#ifdef DEBUG
unsigned signature;
#endif /* DEBUG */
}Struct_A;
i.e. in debug configuration I introduce an additional field into the structure. Each object of that struct type has to have that field initialized with some pre-determined "unpredictable" signature value specific for this type, like
#define STRUCT_A_SIGNATURE 0x12345678
which is easy to do if all structures are created in some centralized fashion (like allocated dynamically or initialized by a dedicated function). This might be more cumbersome if there's no such centralized location. But that the price we sometimes have to pay for safety. For example, in your example case that would be
Struct_A MyStruct1 = {2, 5, 2.8, 0x12345678 };
BTW, designated initializers might make such initializations more stable and it easier to read.
And then, in order to convert pointers from void * to the specific type I use the following cast macro
#ifdef DEBUG
#define TO_STRUCT_A(p)\
(assert((p) == NULL || ((Struct_A *)(p))->signature == STRUCT_A_SIGNATURE),\
(Struct_A *)(p))
#else /* DEBUG */
#define TO_STRUCT_A(p) ((Struct_A *)(p))
#endif /* DEBUG */
meaning that inside your Function you'd do
Struct_A localStruct = *TO_STRUCT_A(ptr);
which with very high probability will trigger assertion failure if a pointer to wrong type is passed to Function.
This all can (and should) be implemented using a more generic set of macros, of course.
Obviously, this only works for struct types, into which you can inject that additional signature field. Another potential problem with this approach is that by introducing an extra field into the structure in debug builds one can potentially cause the behavior of debug and release build to diverge.
You can't, that's the primary downside of void*, you have no way to determine what is being pointed to. You just have to know.
You may use sizeof to compare sizes of the struct to that which void* points points to. This is not a sufficient but necessary condition that a pointer to your struct type is passed to Function. BTW the property of languages with the possibility to check metadata (incl. types) of variables and wider is called reflection. It is available in many modern languages and some bits have been included recently in C++11, but C still lacks it.

Generate version ID of struct definition?

Basically, what I want is some kind of compile-time generated version that is associated with the exact definition of a struct. If the definition of the struct changes in any way (field added, moved, maybe renamed), I want that version to change, too.
Such a version constant would be useful when reading in a previously serialized struct, to make sure that it's still compatible. The alternative would be manually keeping track of a manually specified constant, which has potentially confusing effects if incrementing it is forgotten (deserializing produces garbage), and also raises the question when exactly to increment it (during development and testing, or only during some kind of release).
This could be achieved by using an external tool to generate a hash over the struct definition, but I'm wondering if it is possible with the C compiler (and/or maybe its preprocessor) itself.
This is actually some form of introspection and so I suspect that this may not be possible at all in ANSI C, but I would be happy with a solution that works with gcc and clang.
The Windows API used to (still does?) have a size member as one of the first members of a struct, so that it knew what version of the struct it was being passed (see WNDCLASSEX as an example):
struct Foo
{
size_t size;
char *bar;
char *baz;
/* Other fields */
};
And before calling you set the size using sizeof:
struct Foo f;
f.size = sizeof(struct Foo);
f.bar = strdup("hi");
f.baz = strdup("there");
somefunc(&f);
Then somefunc would know, based on the size member, which version of the struct it was dealing with. Because sizeof is evaluated at compile time instead of run-time, this allows for backwards ABI compatibility.
There is nothing that would do it automatically, but you can build something that works reasonably reliably: you can use sizeof and offsetof, and combine them in such a way that the order in which you combine them mattered. Here is an example:
#include <stdio.h>
#include <stddef.h>
#define COMBINE2(a,b) ((a)*31+(b)*11)
#define COMBINE3(a,b,c) COMBINE2(COMBINE2(a,b),c)
#define COMBINE4(a,b,c,d) COMBINE2(COMBINE3(a,b,c),d)
typedef struct A {
int a1;
char a2;
float a3;
} A;
typedef struct B {
int b1;
char b2;
double b3;
} B;
typedef struct C {
char c2;
int c1;
float c3;
} C;
typedef struct D {
int d1;
char d2;
float d3;
int forgotten[2];
} D;
int main(void) {
size_t aSign = COMBINE4(sizeof(A), offsetof(A,a1), offsetof(A,a2), offsetof(A,a3));
size_t bSign = COMBINE4(sizeof(B), offsetof(B,b1), offsetof(B,b2), offsetof(B,b3));
size_t cSign = COMBINE4(sizeof(C), offsetof(C,c1), offsetof(C,c2), offsetof(C,c3));
size_t dSign = COMBINE4(sizeof(D), offsetof(D,d1), offsetof(D,d2), offsetof(D,d3));
printf("%ld %ld %ld %ld", aSign, bSign, cSign, dSign);
return 0;
}
This code prints
358944 478108 399864 597272
As you can see, this code produces run-time constants for each structure that reacts to re-ordering of fields of different lengths and changing fields' types. It also reacts to adding fields even if you forget to update the list of fields on which you base your computation, which should produce some sort of a safety net.

Static allocation of opaque data types

Very often malloc() is absolutely not allowed when programming for embedded systems. Most of the time I'm pretty able to deal with this, but one thing irritates me: it keeps me from using so called 'opaque types' to enable data hiding. Normally I'd do something like this:
// In file module.h
typedef struct handle_t handle_t;
handle_t *create_handle();
void operation_on_handle(handle_t *handle, int an_argument);
void another_operation_on_handle(handle_t *handle, char etcetera);
void close_handle(handle_t *handle);
// In file module.c
struct handle_t {
int foo;
void *something;
int another_implementation_detail;
};
handle_t *create_handle() {
handle_t *handle = malloc(sizeof(struct handle_t));
// other initialization
return handle;
}
There you go: create_handle() performs a malloc() to create an 'instance'. A construction often used to prevent having to malloc() is to change the prototype of create_handle() like this:
void create_handle(handle_t *handle);
And then the caller could create the handle this way:
// In file caller.c
void i_am_the_caller() {
handle_t a_handle; // Allocate a handle on the stack instead of malloc()
create_handle(&a_handle);
// ... a_handle is ready to go!
}
But unfortunately this code is obviously invalid, the size of handle_t isn't known!
I never really found a solution to solve this in a proper way. I'd very like to know if anyone has a proper way of doing this, or maybe a complete different approach to enable data hiding in C (not using static globals in the module.c of course, one must be able to create multiple instances).
You can use the _alloca function. I believe that it's not exactly Standard, but as far as I know, nearly all common compilers implement it. When you use it as a default argument, it allocates off the caller's stack.
// Header
typedef struct {} something;
size_t get_size();
something* create_something(void* mem);
// Usage
something* ptr = create_something(_alloca(get_size())); // or define a macro.
// Implementation
size_t get_size() {
return sizeof(real_handle_type);
}
something* create_something(void* mem) {
real_handle_type* ptr = (real_handle_type*)mem;
// Fill out real_type
return (something*)mem;
}
You could also use some kind of object pool semi-heap - if you have a maximum number of currently available objects, then you could allocate all memory for them statically, and just bit-shift for which ones are currently in use.
#define MAX_OBJECTS 32
real_type objects[MAX_OBJECTS];
unsigned int in_use; // Make sure this is large enough
something* create_something() {
for(int i = 0; i < MAX_OBJECTS; i++) {
if (!(in_use & (1 << i))) {
in_use |= (1 << i);
return &objects[i];
}
}
return NULL;
}
My bit-shifting is a little off, been a long time since I've done it, but I hope that you get the point.
One way would be to add something like
#define MODULE_HANDLE_SIZE (4711)
to the public module.h header. Since that creates a worrying requirement of keeping this in sync with the actual size, the line is of course best auto-generated by the build process.
The other option is of course to actually expose the structure, but document it as being opaque and forbidding access through any other means than through the defined API. This can be made more clear by doing something like:
#include "module_private.h"
typedef struct
{
handle_private_t private;
} handle_t;
Here, the actual declaration of the module's handle has been moved into a separate header, to make it less obviously visible. A type declared in that header is then simply wrapped in the desired typedef name, making sure to indicate that it is private.
Functions inside the module that take handle_t * can safely access private as a handle_private_t value, since it's the first member of the public struct.
Unfortunately, I think the typical way to deal with this problem is by simply having the programmer treat the object as opaque - the full structure implementation is in the header and available, it's just the responsibility of the programmer to not use the internals directly, only through the APIs defined for the object.
If this isn't good enough, a few options might be:
use C++ as a 'better C' and declare the internals of the structure as private.
run some sort of pre-processor on the headers so that the internals of the structure are declared, but with unusable names. The original header, with good names, will be available to the implementation of the APIs that manage the structure. I've never seen this technique used - it's just an idea off the top of my head that might be possible, but seems like far more trouble than it's worth.
have your code that uses opaque pointers declare the statically allocated objects as extern (ie., globals) Then have a special module that has access to the full definition of the object actually declare these objects. Since only the 'special' module has access to the full definition, the normal use of the opaque object remains opaque. However, now you have to rely on your programmers to not abuse the fact that thee objects are global. You have also increased the change of naming collisions, so that need to be managed (probably not a big problem, except that it might occur unintentionally - ouch!).
I think overall, just relying on your programmers to follow the rules for the use of these objects might be the best solution (though using a subset of C++ isn't bad either in my opinion). Depending on your programmers to follow the rules of not using the structure internals isn't perfect, but it's a workable solution that is in common use.
One solution if to create a static pool of struct handle_t objects, and provide then as neceessary. There are many ways to achieve that, but a simple illustrative example follows:
// In file module.c
struct handle_t
{
int foo;
void* something;
int another_implementation_detail;
int in_use ;
} ;
static struct handle_t handle_pool[MAX_HANDLES] ;
handle_t* create_handle()
{
int h ;
handle_t* handle = 0 ;
for( h = 0; handle == 0 && h < MAX_HANDLES; h++ )
{
if( handle_pool[h].in_use == 0 )
{
handle = &handle_pool[h] ;
}
}
// other initialization
return handle;
}
void release_handle( handle_t* handle )
{
handle->in_use = 0 ;
}
There are faster faster ways of finding an unused handle, you could for example keep a static index that increments each time a handle is allocated and 'wraps-around' when it reaches MAX_HANDLES; this would be faster for the typical situation where several handles are allocated before releasing any one. For a small number of handles however, this brute-force search is probably adequate.
Of course the handle itself need no longer be a pointer but could be a simple index into the hidden pool. This would enhance data hiding and protection of the pool from external access.
So the header would have:
typedef int handle_t ;
and the code would change as follows:
// In file module.c
struct handle_s
{
int foo;
void* something;
int another_implementation_detail;
int in_use ;
} ;
static struct handle_s handle_pool[MAX_HANDLES] ;
handle_t create_handle()
{
int h ;
handle_t handle = -1 ;
for( h = 0; handle != -1 && h < MAX_HANDLES; h++ )
{
if( handle_pool[h].in_use == 0 )
{
handle = h ;
}
}
// other initialization
return handle;
}
void release_handle( handle_t handle )
{
handle_pool[handle].in_use = 0 ;
}
Because the handle returned is no longer a pointer to the internal data, and inquisitive or malicious user cannnot gain access to it through the handle.
Note that you may need to add some thread-safety mechanisms if you are getting handles in multiple threads.
I faced a similar problem in implementing a data structure in which the header of the data structure, which is opaque, holds all the various data that needs to be carried over from operation to operation.
Since re-initialization might cause a memory leak, I wanted to make sure that data structure implementation itself never actually overwrite a point to heap allocated memory.
What I did is the following:
/**
* In order to allow the client to place the data structure header on the
* stack we need data structure header size. [1/4]
**/
#define CT_HEADER_SIZE ( (sizeof(void*) * 2) \
+ (sizeof(int) * 2) \
+ (sizeof(unsigned long) * 1) \
)
/**
* After the size has been produced, a type which is a size *alias* of the
* header can be created. [2/4]
**/
struct header { char h_sz[CT_HEADER_SIZE]; };
typedef struct header data_structure_header;
/* In all the public interfaces the size alias is used. [3/4] */
bool ds_init_new(data_structure_header *ds /* , ...*/);
In the implementation file:
struct imp_header {
void *ptr1,
*ptr2;
int i,
max;
unsigned long total;
};
/* implementation proper */
static bool imp_init_new(struct imp_header *head /* , ...*/)
{
return false;
}
/* public interface */
bool ds_init_new(data_structure_header *ds /* , ...*/)
{
int i;
/* only accept a zero init'ed header */
for(i = 0; i < CT_HEADER_SIZE; ++i) {
if(ds->h_sz[i] != 0) {
return false;
}
}
/* just in case we forgot something */
assert(sizeof(data_structure_header) == sizeof(struct imp_header));
/* Explicit conversion is used from the public interface to the
* implementation proper. [4/4]
*/
return imp_init_new( (struct imp_header *)ds /* , ...*/);
}
client side:
int foo()
{
data_structure_header ds = { 0 };
ds_init_new(&ds /*, ...*/);
}
To expand on some old discussion in comments here, you can do this by providing an allocator function as part of the constructor call.
Given some opaque type typedef struct opaque opaque;, then
Define a function type for an allocator function typedef void* alloc_t (size_t bytes);. In this case I used the same signature as malloc/alloca for compatibility purposes.
The constructor implementation would look something like this:
struct opaque
{
int foo; // some private member
};
opaque* opaque_construct (alloc_t* alloc, int some_value)
{
opaque* obj = alloc(sizeof *obj);
if(obj == NULL) { return NULL; }
// initialize members
obj->foo = some_value;
return obj;
}
That is, the allocator gets provided the size of the opaque object from inside the constructor, where it is known.
For static storage allocation like done in embedded systems, we can create a simple static memory pool class like this:
#define MAX_SIZE 100
static uint8_t mempool [MAX_SIZE];
static size_t mempool_size=0;
void* static_alloc (size_t size)
{
uint8_t* result;
if(mempool_size + size > MAX_SIZE)
{
return NULL;
}
result = &mempool[mempool_size];
mempool_size += size;
return result;
}
(This might be allocated in .bss or in your own custom section, whatever is preferred.)
Now the caller can decide how each object is allocated and all objects in for example a resource-constrained microcontroller can share the same memory pool. Usage:
opaque* obj1 = opaque_construct(malloc, 123);
opaque* obj2 = opaque_construct(static_alloc, 123);
opaque* obj3 = opaque_construct(alloca, 123); // if supported
This is useful for the purpose of saving memory. In case you have multiple drivers in a microcontroller application and each makes sense to hide behind a HAL, they can now share the same memory pool without the driver implementer having to speculate how many instances of each opaque type that will be needed.
Say for example that we have generic HAL for hardware peripherals to UART, SPI and CAN. Rather than each implementation of the driver providing its own memory pool, they can all share a centralized section. Normally I would otherwise solve that by having a constant such as UART_MEMPOOL_SIZE 5 exposed in uart.h so that the user may change it after how many UART objects they need (like the the number of present UART hardware peripherals on some MCU, or the number of CAN bus message objects required for some CAN implementation etc etc). Using #define constants is an unfortunate design since we typically don't want application programmers to mess around with provided standardized HAL headers.
I'm a little confused why you say you can't use malloc(). Obviously on an embedded system you have limited memory and the usual solution is to have your own memory manager which mallocs a large memory pool and then allocates chunks of this out as needed. I've seen various different implementations of this idea in my time.
To answer your question though, why don't you simply statically allocate a fixed size array of them in module.c add an "in-use" flag, and then have create_handle() simply return the pointer to the first free element.
As an extension to this idea, the "handle" could then be an integer index rather than the actual pointer which avoids any chance of the user trying to abuse it by casting it to their own definition of the object.
The least grim solution I've seen to this has been to provide an opaque struct for the caller's use, which is large enough, plus maybe a bit, along with a mention of the types used in the real struct, to ensure that the opaque struct will be aligned well enough compared to the real one:
struct Thing {
union {
char data[16];
uint32_t b;
uint8_t a;
} opaque;
};
typedef struct Thing Thing;
Then functions take a pointer to one of those:
void InitThing(Thing *thing);
void DoThingy(Thing *thing,float whatever);
Internally, not exposed as part of the API, there is a struct that has the true internals:
struct RealThing {
uint32_t private1,private2,private3;
uint8_t private4;
};
typedef struct RealThing RealThing;
(This one just has uint32_t' anduint8_t' -- that's the reason for the appearance of these two types in the union above.)
Plus probably a compile-time assert to make sure that RealThing's size doesn't exceed that of Thing:
typedef char CheckRealThingSize[sizeof(RealThing)<=sizeof(Thing)?1:-1];
Then each function in the library does a cast on its argument when it's going to use it:
void InitThing(Thing *thing) {
RealThing *t=(RealThing *)thing;
/* stuff with *t */
}
With this in place, the caller can create objects of the right size on the stack, and call functions against them, the struct is still opaque, and there's some checking that the opaque version is large enough.
One potential issue is that fields could be inserted into the real struct that mean it requires an alignment that the opaque struct doesn't, and this won't necessarily trip the size check. Many such changes will change the struct's size, so they'll get caught, but not all. I'm not sure of any solution to this.
Alternatively, if you have a special public-facing header(s) that the library never includes itself, then you can probably (subject to testing against the compilers you support...) just write your public prototypes with one type and your internal ones with the other. It would still be a good idea to structure the headers so that the library sees the public-facing Thing struct somehow, though, so that its size can be checked.
It is simple, simply put the structs in a privateTypes.h header file. It will not be opaque anymore, still, it will be private to the programmer, since it is inside a private file.
An example here:
Hiding members in a C struct
This is an old question, but since it's also biting me, I wanted to provide here a possible answer (which I'm using).
So here is an example :
// file.h
typedef struct { size_t space[3]; } publicType;
int doSomething(publicType* object);
// file.c
typedef struct { unsigned var1; int var2; size_t var3; } privateType;
int doSomething(publicType* object)
{
privateType* obPtr = (privateType*) object;
(...)
}
Advantages :
publicType can be allocated on stack.
Note that correct underlying type must be selected in order to ensure proper alignment (i.e. don't use char).
Note also that sizeof(publicType) >= sizeof(privateType).
I suggest a static assert to make sure this condition is always checked.
As a final note, if you believe your structure may evolve later on, don't hesitate to make the public type a bit bigger, to keep room for future expansions without breaking ABI.
Disadvantage :
The casting from public to private type can trigger strict aliasing warnings.
I discovered later on that this method has similarities with struct sockaddr within BSD socket, which meets basically the same problem with strict aliasing warnings.

getting a substruct out of a big struct in C

I'm having a very big struct in an existing program. This struct includes a great number of bitfields.
I wish to save a part of it (say, 10 fields out of 150).
An example code I would use to save the subclass is:
typedef struct {int a;int b;char c} bigstruct;
typedef struct {int a;char c;} smallstruct;
void substruct(smallstruct *s,bigstruct *b) {
s->a = b->a;
s->c = b->c;
}
int save_struct(bigstruct *bs) {
smallstruct s;
substruct(&s,bs);
save_struct(s);
}
I also wish that selecting which part of it wouldn't be too much hassle, since I wish to change it every now and then. The naive approach I presented before is very fragile and unmaintainable. When scaling up to 20 different fields, you have to change fields both in the smallstruct, and in the substruct function.
I thought of two better approaches. Unfortunately both requires me to use some external CIL like tool to parse my structs.
The first approach is automatically generating the substruct function. I'll just set the struct of smallstruct, and have a program that would parse it and generate the substruct function according to the fields in smallstruct.
The second approach is building (with C parser) a meta-information about bigstruct, and then write a library that would allow me to access a specific field in the struct. It would be like ad-hoc implementation of Java's class reflection.
For example, assuming no struct-alignment, for struct
struct st {
int a;
char c1:5;
char c2:3;
long d;
}
I'll generate the following meta information:
int field2distance[] = {0,sizeof(int),sizeof(int),sizeof(int)+sizeof(char)}
int field2size[] = {sizeof(int),1,1,sizeof(long)}
int field2bitmask[] = {0,0x1F,0xE0,0};
char *fieldNames[] = {"a","c1","c2","d"};
I'll get the ith field with this function:
long getFieldData(void *strct,int i) {
int distance = field2distance[i];
int size = field2size[i];
int bitmask = field2bitmask[i];
void *ptr = ((char *)strct + distance);
long result;
switch (size) {
case 1: //char
result = *(char*)ptr;
break;
case 2: //short
result = *(short*)ptr;
...
}
if (bitmask == 0) return result;
return (result & bitmask) >> num_of_trailing_zeros(bitmask);
}
Both methods requires extra work, but once the parser is in your makefile - changing the substruct is a breeze.
However I'd rather do that without any external dependencies.
Does anyone have any better idea? Where my ideas any good, is there some availible implementation of my ideas on the internet?
From your description, it looks like you have access to and can modify your original structure. I suggest you refactor your substructure into a complete type (as you did in your example), and then make that structure a field on your big structure, encapsulating all of those fields in the original structure into the smaller structure.
Expanding on your small example:
typedef struct
{
int a;
char c;
} smallstruct;
typedef struct
{
int b;
smallstruct mysub;
} bigstruct;
Accessing the smallstruct info would be done like so:
/* stack-based allocation */
bigstruct mybig;
mybig.mysub.a = 1;
mybig.mysub.c = '1';
mybig.b = 2;
/* heap-based allocation */
bigstruct * mybig = (bigstruct *)malloc(sizeof(bigstruct));
mybig->mysub.a = 1;
mybig->mysub.c = '1';
mybig->b = 2;
But you could also pass around pointers to the small struct:
void dosomething(smallstruct * small)
{
small->a = 3;
small->c = '3';
}
/* stack based */
dosomething(&(mybig.mysub));
/* heap based */
dosomething(&((*mybig).mysub));
Benefits:
No Macros
No external dependencies
No memory-order casting hacks
Cleaner, easier-to-read and use code.
If changing the order of the fields isn't out of the question, you can rearrange the bigstruct fields in such a way that the smallstruct fields are together, and then its simply a matter of casting from one to another (possibly adding an offset).
Something like:
typedef struct {int a;char c;int b;} bigstruct;
typedef struct {int a;char c;} smallstruct;
int save_struct(bigstruct *bs) {
save_struct((smallstruct *)bs);
}
Macros are your friend.
One solution would be to move the big struct out into its own include file and then have a macro party.
Instead of defining the structure normally, come up with a selection of macros, such as BEGIN_STRUCTURE, END_STRUCTURE, NORMAL_FIELD, SUBSET_FIELD
You can then include the file a few times, redefining those structures for each pass. The first one will turn the defines into a normal structure, with both types of field being output as normal. The second would define NORMAL_FIELD has nothing and would create your subset. The third would create the appropriate code to copy the subset fields over.
You'll end up with a single definition of the structure, that lets you control which fields are in the subset and automatically creates suitable code for you.
Just to help you in getting your metadata, you can refer to the offsetof() macro, which also has the benefit of taking care of any padding you may have
I suggest to take this approach:
Curse the guy who wrote the big structure. Get a voodoo doll and have some fun.
Mark each field of the big structure that you need somehow (macro or comment or whatever)
Write a small tool which reads the header file and extracts the marked fields. If you use comments, you can give each field a priority or something to sort them.
Write a new header file for the substructure (using a fixed header and footer).
Write a new C file which contains a function createSubStruct which takes a pointer to the big struct and returns a pointer to the substruct
In the function, loop over the fields collected and emit ss.field = bs.field (i.e. copy the fields one by one).
Add the small tool to your makefile and add the new header and C source file to your build
I suggest to use gawk, or any scripting language you're comfortable with, as the tool; that should take half an hour to build.
[EDIT] If you really want to try reflection (which I suggest against; it'll be a whole lot of work do get that working in C), then the offsetof() macro is your friend. This macro returns the offset of a field in a structure (which is most often not the sum of the sizes of the fields before it). See this article.
[EDIT2] Don't write your own parser. To get your own parser right will take months; I know since I've written lots of parsers in my life. Instead mark the parts of the original header file which need to be copied and then rely on the one parser which you know works: The one of your C compiler. Here are a couple of ideas how to make this work:
struct big_struct {
/**BEGIN_COPY*/
int i;
int j : 3;
int k : 2;
char * str;
/**END_COPY*/
...
struct x y; /**COPY_STRUCT*/
}
Just have your tool copy anything between /**BEGIN_COPY*/ and /**END_COPY*/.
Use special comments like /**COPY_STRUCT*/ to instruct your tool to generate a memcpy() instead of an assignment, etc.
This can be written and debugged in a few hours. It would take as long to set up a parser for C without any functionality; that is you'd just have something which can read valid C but you'd still have to write the part of the parser which understands C, and the part which does something useful with the data.

Resources