Generic header placeholder pointers with casting in implementation - a good idea? - c

I’m building a small multi-platform C library. A design objective is to have a set of header files that are suitably generic, and which define a number of placeholder pointers to structs - if that’s the right term - like this:
/* mylib_types.h */
typedef struct _mylib_matrix *mylib_matrix;
These placeholders can be used to specify the parameters to the function prototypes in the other headers, like:
/* mylib_api.h */
MY_API mylib_status mylibAddMatrix(mylib_matrix a,
mylib_matrix b,
mylib_matrix* result);
So, that’s fine for the headers - everything is self-contained and stand-alone. Then, when it comes to implementing the library I want to use different underlying, platform specific, libraries to actually implement the methods.
The idea being that the library is optimised for any given platform, but the API to the library will be universally defined (so easily cross-compiled).
The problem I have is that: yes - I have got this working - but in the rather crude way using casting. I just wonder what the best practice - if any - actually is?
For example, in my implementation of a method I must then remember to immediately cast the placeholder pointer to the actual type of thing we are using for that platforms implementation, and similarly cast back any results.
e.g.
/* mylib_matrix.c */
#include “mylib_types.h"
#include “mylib_api.h”
#include <PlatformSpecificFunkyMatrix.h>
MY_API mylib_status mylibAddMatrix(mylib_matrix a,
mylib_matrix b,
mylib_matrix* result)
{
*result = (mylib_matrix)PlatformSpecificFunkyMatrix_AddMatrix(
(PlatformSpecificFunkyMatrix*)a,
(PlatformSpecificFunkyMatrix*)b);
return MYLIB_SUCCESS;
}
This all seems very brittle and liable for me to forget a cast or allowing the compiler to do any type checking. Is it at all principled?
I guess I could be explicit in my types of cast - but that still requires some consideration. Perhaps some pre-processor #defines might help wrap things up, but of course that can get rather messy... I could of course go and redefine the low-level structs (e.g. mylib_matrix) for each implementation, but then we are talking a different set of headers for each platform (again, I could go with the preprocessor to help swap the right definitions in or out).
Hmmm. Maybe I’m dwelling too much upon this...

One way to get around the casting.
In the platform specific file, use:
struct _mylib_matrix
{
PlatformSpecificFunkyMatrix* realMatrix;
};
and
MY_API mylib_status mylibAddMatrix(mylib_matrix a,
mylib_matrix b,
mylib_matix* result)
{
PlatformSpecificFunkyMatrix* r =
PlatformSpecificFunkyMatrix_AddMatrix(a->realMatrix, b->realMatrix);
*result = malloc(sizeof(_mylib_matrix));
*result->realMatrix = r
return MYLIB_SUCCESS;
}
Better still...
You can avoid the double indirection and the need for casting by using:
struct _mylib_matrix
{
// Add all the data here that you have in PlatformSpecificFunkyMatrix
};
typedef struct _mylib_matrix PlatformSpecificFunkyMatrix;
and then,
MY_API mylib_status mylibAddMatrix(mylib_matrix a,
mylib_matrix b,
mylib_matix* result)
{
*result = PlatformSpecificFunkyMatrix_AddMatrix(a, b);
return MYLIB_SUCCESS;
}

Related

Using different struct definitions to simulate public and private fields in C

I have been writing C for a decent amount of time, and obviously am aware that C does not have any support for explicit private and public fields within structs. However, I (believe) I have found a relatively clean method of implementing this without the use of any macros or voodoo, and I am looking to gain more insight into possible issues I may have overlooked.
The folder structure isn't all that important here but I'll list it anyway because it gives clarity as to the import names (and is also what CLion generates for me).
- example-project
- cmake-build-debug
- example-lib-name
- include
- example-lib-name
- example-header-file.h
- src
- example-lib-name
- example-source-file.c
- CMakeLists.txt
- CMakeLists.txt
- main.c
Let's say that example-header-file.h contains:
typedef struct ExampleStruct {
int data;
} ExampleStruct;
ExampleStruct* new_example_struct(int, double);
which just contains a definition for a struct and a function that returns a pointer to an ExampleStruct.
Obviously, now if I import ExampleStruct into another file, such as main.c, I will be able to create and return a pointer to an ExampleStruct by calling
ExampleStruct* new_struct = new_example_struct(<int>, <double>);,
and will be able to access the data property like: new_struct->data.
However, what if I also want private properties in this struct. For example, if I am creating a data structure, I don't want it to be easy to modify the internals of it. I.e. if I've implemented a vector struct with a length property that describes the current number of elements in the vector, I wouldn't want for people to just be able to change that value easily.
So, back to our example struct, let's assume we also want a double field in the struct, that describes some part of internal state that we want to make 'private'.
In our implementation file (example-source-file.c), let's say we have the following code:
#include <stdlib.h>
#include <stdbool.h>
typedef struct ExampleStruct {
int data;
double val;
} ExampleStruct;
ExampleStruct* new_example_struct(int data, double val) {
ExampleStruct* new_example_struct = malloc(sizeof(ExampleStruct));
example_struct->data=data;
example_struct->val=val;
return new_example_struct;
}
double get_val(ExampleStruct* e) {
return e->val;
}
This file simply implements that constructor method for getting a new pointer to an ExampleStruct that was defined in the header file. However, this file also defines its own version of ExampleStruct, that has a new member field not present in the header file's definition: double val, as well as a getter which gets that value. Now, if I import the same header file into main.c, which contains:
#include <stdio.h>
#include "example-lib-name/example-header-file.h"
int main() {
printf("Hello, World!\n");
ExampleStruct* test = new_example(6, 7.2);
printf("%d\n", test->data); // <-- THIS WORKS
double x = get_val(test); // <-- THIS AND THE LINE BELOW ALSO WORK
printf("%f\n", x); //
// printf("%f\n", test->val); <-- WOULD THROW ERROR `val not present on struct!`
return 0;
}
I tested this a couple times with some different fields and have come to the conclusion that modifying this 'private' field, val, or even accessing it without the getter, would be very difficult without using pointer arithmetic dark magic, and that is the whole point.
Some things I see that may be cause for concern:
This may make code less readable in the eyes of some, but my IDE has arrow buttons that take me to and from the definition and the implementation, and even without that, a one line comment would provide more than enough documentation to point someone in the direction of where the file is.
Questions I'd like answers on:
Are there significant performance penalties I may suffer as a result of writing code this way?
Am I overlooking something that may make this whole ordeal pointless, i.e. is there a simpler way to do this or is this explicitly discouraged, and if so, what are the objective reasons behind it.
Aside: I am not trying to make C into C++, and generally favor the way C does things, but sometimes I really want some encapsulation of data.
Am I overlooking something that may make this whole ordeal pointless, i.e. is there a simpler way to do this or is this explicitly discouraged, and if so, what are the objective reasons behind it.
Yes: your approach produces undefined behavior.
C requires that
All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.
(C17 6.2.7/2)
and that
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
[...]
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a
subaggregate or contained union), or
a character type.
(C17 6.5/7, a.k.a. the "Strict Aliasing Rule")
Your two definitions of struct ExampleStruct define incompatible types because they specify different numbers of members (see C17 6.2.7/1 for more details on structure type compatibility). You will definitely have problems if you pass instances by value between functions relying on different of these incompatible definitions. You will have trouble if you construct arrays of them, whether dynamically, automatically, or statically, and attempt to use those across boundaries between TUs using one definition and those using another. You may have problems even if you do none of the above, because the compiler may behave unexpectedly, especially when optimizing. DO NOT DO THIS.
Other alternatives:
Opaque pointers. This means you do not provide any definition of struct ExampleStruct in those TUs where you want to hide any of its members. That does not prevent declaring and using pointers to such a structure, but it does prevent accessing any members, declaring new instances, or passing or receiving instances by value. Where member access is needed from TUs that do not have the structure definition, it would need to be mediated by accessor functions.
Just don't access the "private" members. Do not document them in the public documentation, and if you like, explicity mark them (in code comments, for example) as reserved. This approach will be familiar to many C programmers, as it is used a lot for structures declared in POSIX system headers.
As long as the public has a complete definition for ExampleStruct, it can make code like:
ExampleStruct a = *new_example_struct(42, 1.234);
Then the below will certainly fail.
printf("%g\n", get_val(&a));
I recommend instead to create an opaque pointer and provide access public functions to the info in .data and .val.
Think of how we use FILE. FILE *f = fopen(...) and then fread(..., f), fseek(f, ...), ftell(f) and eventually fclose(f). I suggest this model instead. (Even if in some implementations FILE* is not opaque.)
Are there significant performance penalties I may suffer as a result of writing code this way?
Probably:
Heap allocation is expensive, and - today - usually not optimized away even when that is theoretically possible.
Dereferencing a pointer for member access is expensive; although this might get optimized away with link-time-optimization... if you're lucky.
i.e. is there a simpler way to do this
Well, you could use a slack array of the same size as your private fields, and then you wouldn't need to go through pointers all the time:
#define EXAMPLE_STRUCT_PRIVATE_DATA_SIZE sizeof(double)
typedef struct ExampleStruct {
int data;
_Alignas(max_align_t) private_data[EXAMPLE_STRUCT_PRIVATE_DATA_SIZE];
} ExampleStruct;
This is basically a type-erasure of the private data without hiding the fact that it exists. Now, it's true that someone can overwrite the contents of this array, but it's kind of useless to do it intentionally when you "don't know" what the data means. Also, the private data in the "real" definition will need to have the same, maximal, _AlignAs() as well (if you want the private data not to need to use AlignAs(), you will need to use the real alignment quantum for the type-erased version).
The above is C11. You can sort of do about the same thing by typedef'ing max_align_t yourself, then using an array of max_align_t elements for private data, with an appropriate length to cover the actual size of the private data.
An example of the use of such an approach can be found in CUDA's driver API:
Parameters for copying a 3D array: CUDA_MEMCPY3D vs
Parameters for copying a 3D array between two GPU devices: CUDA_MEMCPY3D_peer
The first structure has a pair of reserved void* fields, hiding the fact that it's really the second structure. They could have used an unsigned char array, but it so happens that the private fields are pointer-sized, and void* is also kind of opaque.
This causes undefined behaviour, as detailed in the other answers. The usual way around this is to make a nested struct.
In example.h, one defines the public-facing elements. struct example is not meant to be instantiated; in a sense, it is abstract. Only pointers that are obtained from one of it's (in this case, the) constructor are valid.
struct example { int data; };
struct example *new_example(int, double);
double example_val(struct example *e);
and in example.c, instead of re-defining struct example, one has a nested struct private_example. (Such that they are related by composite aggregation.)
#include <stdlib.h>
#include "example.h"
struct private_example {
struct example public;
double val;
};
struct example *new_example(int data, double val) {
struct private_example *const example = malloc(sizeof *example);
if(!example) return 0;
example->public.data = data;
example->val = val;
return &example->public;
}
/** This is a poor version of `container_of`. */
static struct private_example *example_upcast(struct example *example) {
return (struct private_example *)(void *)
((char *)example - offsetof(struct private_example, public));
}
double example_val(struct example *e) {
return example_upcast(e)->val;
}
Then one can use the object as in main.c. This is used frequently in linux kernel code for container abstraction. Note that offsetof(struct private_example, public) is zero, ergo example_upcast does nothing and a cast is sufficient: ((struct private_example *)e)->val. If one builds structures in a way that always allows casting, one is limited by single inheritance.

Object Oriented C like code - Identification of caller without parameters

If I have this setup:
#include <stdlib.h>
#define NEW_FOO ((foo_t*)malloc(sizeof(foo_t)))
void foo_func(void);
typedef struct {
void (*foo) (void);
} foo_t;
int main(void) {
foo_t *a = NEW_FOO;
foo_t *b = NEW_FOO;
a->foo = foo_func;
b->foo = foo_func;
a->foo();
b->foo();
}
void foo_func(void) {
// determine wheter a or b was called?
}
Can I then find out, wheter a or b was the caller of foo_func, strictly without a parameter like self, this, ...?
The return address should be on the stack, so you should be able to identify the caller somehow, no?
I thought of a possible approach (it builds upon the idea above): The first time the foo_func is called (maybe through an initialization function, but let's leave that out to keep it simple) through a->foo(), store the address of struct a in some sort of array of pointers, I would assume. Same with b->foo(). Then, anytime that a->foo() or b->foo() is called, you would compare the address of the caller struct with the contents in the array to identify wheter it was a or b that called foo_func().
It's just that I have no Idea if and/or how that is possible, so if anyone of you could help me with this, I would be very glad!
I guess you're annoyed about the unsightliness of constructions like:
a->foo (a, arg0, arg1);
b->bar (b, arg0);
Unfortunately, the style of programming you've adopted does force this style on you, if you want to implement a simulation of polymorphic methods. Maybe you can implement a set of macros so you can write something like:
METHOD_CALL2 (foo, a, arg0, arg1);
METHOD_CALL1 (bar, b, arg0);
and so not have to repeat the "object" names a, b, etc., in the call. I've seen this done as well but, in my view, it doesn't look any prettier, and I'm sure it's no more maintainable.
As this is C, not C++, in the end you're going to have to have some way to pass your equivalent of this to the "methods" in your implementation. You might be able to disguise it with macros and variable-length argument lists, but it's going to have to happen somehow.
But why worry? This is idiomatic C code -- every application and library that takes an object-oriented approach to C will be using constructions of the form you want to avoid. People will understand what you're doing. Trying to disguise it will not make your code easier to follow, I suspect.

Static Asserts for identifying broken auto generated interface layers in ANSI C

Question
I try to find static (compile time) asserts, to ensure (as good as possible) things below. As I use them in an auto code generation context (see “Background” below) they do not have to be neat, the only have to break compilation, at best with zero overhead. Elegant variants are welcomed though.
The following things shall be checked:
A Type Identity
typedef T T1;
typedef T T2;
typedef X T3;
T1 a;
T2 b;
T3 c;
SA_M1(T1,T2); /* compilation */
SA_M1(T1,T3); /* compilation error */
SA_M2(a,b); /* compilation */
SA_M2(a,c); /* compilation error */
where X and T are C Types (including structured, aggregated, object pointer, not so important function pointer). Note again, that a set of partly successful solutions also helps.
Some solutions that I assume will partly work:
comparing the sizes
checking if the type is a pointer as claimed by trying to dereference it.
for unsigned integers: Compare a casted slightly to big value with the expected wrap around value.
for floats, compare double precision exact representable value with the casted one (hoping the best for platform specific rounding operations)
B A Variable has global Scope
My solution here is at the momement simply to generate a static function, that tries to get a reference to the global variable Assume that X is a global variable:
static void SA_IsGlobal_X() {(void) (&X == NULL); /* Dummy Operation */}
C A Function has the correct number of parameters
I have no idea yet.
D If the prototype of a functions is as it is expected
I have no idea yet.
E If a function or macro parameters are compile time constants (
This question is discussed here for macros:
Macro for use in expression while enforcing its arguments to be compile time constants
For functions, an wrapper macro could do.
Z Other things you might like to check considering the “background” part below
Preferred are answers that can be done with C89, have zero costs in runtime, stack and (with most compilers) code size. As the checks will be auto generated, readability is not so important, but I like to place the checks in static functions, whenever possible.
Background:
I want to provide C functions as well as an interface generator to allow them to smoothly being integrated in different C frameworks (with C++ on the horizon). The user of the interface generator then only specifies where the inputs come from, and which of the outputs shall go where. Options are at least:
RAW (as it is implemented - and should be used)
from the interface functions parameter, which is of a type said to be the same as my input/output (and perhaps is a field of a structure or an array element)
from a getter/setter function
from a global variable
using a compile time constant
I will:
ask for a very detailed interface specification (including specification errors)
use parsers to check typedefs and declarations (including tool bugs and my tool usage errors)
But this happens at generation time. Besides everything else: if the user change either the environment or takes a new major version of my function (this can be solved by macros checking versions), without running the interface generator again, I would like to have a last defense line at compile time.
The resulting code of the generations might near worst case be something like:
#include "IFMyFunc.h" /* contains all user headers for the target framework(s) */
#include "MyFunc.h"
RetType IFMYFunc(const T1 a, const struct T2 * const s, T3 * const c)
{
/* CHECK INTERFACE */
CheckIFMyFunc();
/* get d over a worst case parametrized getter function */
const MyD_type d = getD(s->dInfo);
/* do horrible call by value and reference stuff, f and g are global vars */
c.c1 = MyFunc(a,s->b,c.c1,d,f,&(c->c2), &e,&g);
set(e);
/* return something by return value */
return e;
}
(I am pretty sure I will restrict the combos though).
static void CheckIFMyFunc(void)
{
/* many many compile time checks of types and specifications */
}
or I will provide a piece of code (local block) to be directly infused - which is horrible architecture, but might be necessary if we can't abandon some of the frame work fast enough, supported by some legacy scripts.
for A, would propose:
#define SA_M1(A, B) \
do { \
A ___a; \
B ___b = ___a; \
(void)___b; \
} while (0)
for D (and I would say that C is already done by D)
typedef int (*myproto)(int a, char **c);
#define FN_SA(Ref, Challenger) \
do { \
Ref ___f = Challenger; \
(void) ___f; \
} while (0)
void test(int argc, char **argv);
int main(int argc, char **argv)
{
FN_SA(myproto, main);
FN_SA(myproto, test); /* Does not compile */
return 0;
}
Nevertheless, there are some remaining problems with void *:
any pointer may be casted to/from void * in C, which will probably make the solution for A fail in some cases....
BTW, if you plan to use C++ in the meanterm, you could just use C++ templates and so on to have this stests done here. Would be far more clean and reliable IMHO.

How to convert GMP C parameter convention into something more natural?

For example, I would like to do something like this:
#include <gmp.h>
typedef mpz_t Integer;
//
Integer F(Integer a,Integer b,Integer c,Integer d) {
Integer ret = times(plus(a,b),plus(c,d));
}
But, GMP doesn't let me do this, apparently mpz_t is an array, so I get the error:
error: ‘F’ declared as function returning an array
So instead I would have to do something like this:
void F(Integer ret,Integer a,Integer b,Integer c,Integer d) {
Integer tmp1,tmp2;
plus(tmp1,a,b);
plus(tmp2,c,d);
times(ret,tmp1,tmp2);
}
This is unnatural, and not following the logical way that C (or in general mathematical) expressions can be composed. In fact, you can't compose anything in a math-like way because apparently you can't return GMP numbers! If I wanted to write - for example - a simple yacc/bison style parser that converted a simple syntax using +, -, /, * etc. into C code implementing the given expressions using GMP it seems it would be much more difficult as I would have to keep track of all the intermediate values.
So, how can I force GMP to bend to my will here and accept a more reasonable syntax? Can I safely "cheat" and cast mpz_t to a void * and then reconstitute it at the other end back into mpz_t? I'm assuming from reading the documentation that it is not really passing around an array, but merely a reference, so why can't it return a reference as well? Is there some good sound programming basis for doing it this way that I should consider in writing my own program?
From gmp.h:
typedef __mpz_struct mpz_t[1];
This makes a lot of sense, and is pretty natural. Think about it: having an
array of size 1 allows you to deal with an obscured pointer (known as opaque
reference) and all its advantages:
mpz_t number;
DoubleIt(number); /* DoubleIt() operates on `number' (modifies it) as
it will be passed as a pointer to the real data */
Were it not an array, you'd have to do something like:
mpz_t number;
DoubleIt(&number);
And then it comes all the confusion. The intention behind the opaque type is
to hide these, so you don't have to worry about it. And one of the main
concerns should be clear: size (which leads to performance). Of course you
can't return such struct that holds data limited to the available memory. What
about this one (consider mpz_t here as a "first-class" type):
mpz_t number = ...;
number = DoubleIt(number);
You (the program) would have to copy all the data in number and push it as a
parameter to your function. Then it needs to leave appropriate space for
returning another number even bigger.
Conclusion: as you have to deal with data indirectly (with pointers) it's
better to use an opaque type. You'll be passing a reference only to your
functions, but you can operate on them as if the whole concept was
pass-by-reference (C defaults to pass-by-reference).

Which way is better for creating type-agnostic structures in C?

I'm trying to write some generic structures. Essentially, what I need for my purpose is C++ templates, but since I'm writing in C, templates are out of consideration. Currently I'm considering 2 ways of achieving what I want.
Method 1: use the preprocessor. Like so:
#define DEFINE_PAIR(T) typedef struct Pair_##T{ \
T x; \
T y; \
} Pair_##T
DEFINE_PAIR(int);
int main(){
Pair_int p;
return 0;
}
An obvious downside to it is that you have to invoke the macro before using the type. Probably there are more disadvantages, which I hope you will point out.
Method 2: just use void-pointers, like so:
typedef struct Pair{
void* x;
void* y;
} Pair;
Obviously, this approach is not type safe (I could easily pass a pair of strings to a function expecting a pair of doubles), plus the code doing deallocation gets a lot messier with this approach.
I would like to hear your thoughts on this. Which of the two methods is better/worse and why? Is there any other method I could use to write generic structures in C?
Thanks.
If you only plan on using primitive data types, then your original macro-based solution seems nifty enough. However, when you start storing pairs of pointers to opaque data types with complex structures underneath that are meant to be used by passing pointers between functions, such as:
complex_structure_type *object = complex_structure_type_init();
complex_structure_type_set_title(object, "Whatever");
complex_structure_type_free(object);
then you have to
typedef complex_structure_type *complex_structure_type_ptr;
in order to
DEFINE_PAIR(complex_structure_type_ptr);
so you can
Pair_complex_structure_type_ptr p;
and then
p.x = object;
But that's only a little bit more work, so if you feel it works for you, go for it. You might even put together your own preprocessor that goes through the code, pulls out anything like Pair_whatever, and then adds DEFINE_PAIR(whatever) for the C preprocessor. Anyway, it's definitely a neat idea that you've presented here.
Personally, I would just use void pointers and forget about strong type safety. C just doesn't have the same type safety machinery as other languages, and the more opportunities you give yourself to forget something, the more bugs you'll accidentally create.
Good luck!
Noting that templates in c++ provide a language for writing code, you might simple consider doing code generation with some tool more powerful than the c-preprocessor.
Now that does add another step to you build, and makes you build depend on another toll (unless you care to write your own generator in c...), but it may provide the flexibility and type-safety you desire.
This is almost the same, but it's a bit more nimble:
#define PAIR_T(TYPE) \
struct { \
TYPE x; \
TYPE y; \
}
typedef PAIR_T(int) int_pair;
typedef PAIR_T(const char *) string_pair;
int main(void)
{
int_pair p = {1, 1};
string_pair sp = {"a", "b"};
}

Resources