Aliasing structures (or pasting definition of one into another) - c

I want to create an API for setting and getting fields of a structure in an opaque way (clients should only deal with pointers to them and pass them to the methods declared in the header files). Standard stuff, you define your structures inside the library's source files and do
typedef struct __internal_struct shiny_new_opaque_type;
Problem is that at the moment, the class is simply a wrapper around an already existing API (that will change soon). So the structures I need to use are defined in other header files (full structure declaration is there, the one I want to hide from my clients, so any attempt to dereference a pointer and access a structure member will result in a compiler error). Hence, I don't want to include those headers in my header (only in the .c files). I see three possible ways of dealing with it.
Instead of
typedef struct __internal_struct shiny_new_opaque_type;
do
typedef void shiny_new_opaque_type;
and have my methods do pointer casting. This is dangerous since the compiler can't do type checks.
Copy paste the structure definitions I'm currently using under a new struct __internal_struct (eventually I'll have to define my own struct anyway). Maybe this is the best option?
Define my __internal_struct for now to include a single member that is the corresponding structure from the other API I'm using and use that. Kind of ugly...
Basically is there a way to typedef one structure to another or use an already defined structure as an anonymous member inside another, so that at the end of the day both structures are equivalent? Neither of the following works:
typedef struct transparent_struct struct __internal_struct;
struct __internal_struct
{
struct transparent; // anonymous, direct access to its members
}
EDIT:
From the comments, seems to me that 3, or a variation thereof, would be the way to go. There is also the possibility of never defining my struct, as #Akira pointed out. So
In header: typedef struct my_type; // never defined
And in my source always use it with a cast (struct transparent*)my_type_ptr
In header: typedef struct _internal_struct my_type;
And in source files:
struct _internal_struct {
struct transparent t;
}
Then I can either one of those:
my_type_ptr->t.member
((struct transparent*)my_type_ptr)->member

If you are planning to use opaque pointers, you should provide only an incomplete struct type to users and let them do the operations through the provided functions where one the parameters is a pointer to your incomplete struct type.
For example, let's consider that we have an API which provides a struct foo type and a void print(struct foo*) function to print the content of struct foo instances. A wrapper can be implemented as follows:
wrapper.h
#ifndef WRAPPER_H
#define WRAPPER_H
struct my_obj; /* incomplete type, but you can use pointers to it */
struct my_obj* create(void); /* creates new struct my_obj instance */
void destroy(struct my_obj*); /* deletes the pointed struct my_obj instance */
void set_name_and_id(struct my_obj*, const char*, unsigned);
void show(struct my_obj*);
#endif /* WRAPPER_H */
wrapper.c
#include "wrapper.h"
#include "api.h" /* API only included here */
#include <stdlib.h>
#include <string.h>
#define TO_FOO(my_ptr) ((struct foo*)my_ptr)
struct my_obj* create(void) {
return calloc(1, sizeof(struct foo)); /* allocates memory for 'struct foo' */
}
void destroy(struct my_obj* obj) {
free(obj);
}
void set_name_and_id(struct my_obj* obj, const char* name, unsigned id) {
strcpy(TO_FOO(obj)->bar, name);
TO_FOO(obj)->baz = id;
}
void show(struct my_obj* obj) {
print(TO_FOO(obj)); /* accepts only 'struct foo' pointers */
}
Live Demo
When users include the wrapper.h from the example above, they won't see the api.h and won't be able to dereference the pointer to struct my_obj because it's an incomplete type.
To respond to your comment:
In this case both the internal_struct and the api one are of the same size, aligned and I can either access the api's members using internal_struct->api.member or ((struct API*)internal_struct)->member. What's your view on those two options?
According to N1570 draft (c11):
6.7.2.1 Structure and union specifiers
15 (...) A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
So, both of your approaches are good and safe, it's up to you which one you like. Using internal_struct->api.member is clear, I would use this version.

Converting comments into an answer.
Avoid option 1 — the API shouldn't use void pointers because of the lack of type safety. C is bad enough as it is; don't go out of your way to drive holes through what type safety is available. If the interface type is struct SomeThing *, you can pass a void * to the function without wittering from the C compiler, but you can't pass a struct SomeThingElse * to the function (without a cast, but needing to add a cast should raise warning flags in your mind). If the API uses void *, you can pass any pointer type to the function without any casts or warnings; that's highly undesirable.
Option 2 is a maintenance liability, if not nightmare. Don't go there.
Therefore, option 3 is the way to go. You have two sub-options.
3A — your structure simply contains a single member that is a pointer to the API's structure type (struct internal_struct { struct API *api; }), and
3B — your structure simply contains a single member that is the API's structure type (struct internal_struct { struct API api; }) — the difference is the presence or absence of the *.
Both 3A and 3B work; which works better for you depends on the organization of the API you're working with — how near to opaque it treats its structure type. The more nearly opaque the structure type, the more appropriate 3A is. On the other hand, it incurs some overhead in accessing the data.
Indeed I ended up going with option 3B. In this case both the internal_struct and the api one are of the same size, aligned and I can either access the api's members using internal_struct->api.member or ((struct API*)interna_struct)->member. What's your view on those two options?
While the version with the cast works, it sucks as notation. Avoid casts whenever you can — they're a bludgeon that tells the compiler "I know better than you do what I'm doing". I avoid casts as much as possible. Yes, I sometimes use casts; that's almost unavoidable. But I avoid them when possible, and this is a case where it's eminently possible.
Using option 3B, the chances are the compiler generates the same code for both internal_struct->api.member and ((struct API *)internal_struct)->member. So, use the cleaner notation — which is also more succinct. There's a mild nuisance from repeating the api.; there's a bigger nuisance from adding parentheses and repeating struct API *.
If you did the cast once:
struct API *api_ptr = (struct API *)internal_struct;
and then used api_ptr->member etc throughout, that might be sensible, but the castless version would still be better.
struct API *api_ptr = &internal_struct->api;

Related

Using an opaque pointer to a non struct type

In my C programming, I use opaque-pointers to struct as a way to enforce abstraction and encapsulation of my code, in that manner :
interface_header.h:
typedef struct s_mytype t_mytype;
defs_header.h:
struct s_mytype
{
/* Actual definition of the struct */
};
My problem I want to use a simple type as t_mytype (char for example), but not inform the interface about that. I can't just use typedef char t_mytype, since that would expose the the internals of the type.
I could just use void pointers, but at the cost of type-checking, and I'd rather avoid that.
Doing two typedef wont work either, since that throw an typedef redefinition with different types from the compiler.
I am also considering doing a struct with only one member, which will be my simple type, but would that be an overkill ?
Thanks for your answers.
You have to make a decision. With code like this:
typedef char myHandle;
frobnicate(myHandle *obj);
you already document the intent that client code should only use pointers to myHandle and never assume anything about the underlying type, so you should be able to change the typedef later -- unless there's "sloppy" client code making assumptions it shouldn't.
If you want to completely hide what's behind myHandle, a struct is your only option in C:
typedef struct myHandle myHandle;
frobnicate(myHandle *obj);
and only in a private header or implementation file, you will put
struct myHandle
{
char val;
};
The first option is simpler because inside the implementation of frobnicate(), you can access your value simply using *obj while the second option requires to write obj->val. What you gain with the second version is that you force client code to be written correctly.
In terms of the resulting executable, both versions are equivalent.

Use of redefining void pointer to pointer to an anonymous structure?

I was working with UEFI driver-related code, and I came across this:
/* EFI headers define EFI_HANDLE as a void pointer, which renders type
* checking somewhat useless. Work around this bizarre sabotage
* attempt by redefining EFI_HANDLE as a pointer to an anonymous
* structure.
*/
#define EFI_HANDLE STUPID_EFI_HANDLE
#include <ipxe/efi/Uefi/UefiBaseType.h>
#undef EFI_HANDLE
typedef struct {} *EFI_HANDLE;
The full source code is in this path
http://dox.ipxe.org/include_2ipxe_2efi_2efi_8h_source.html
This is my first encounter with anonymous structure, and I couldn't make out the logic of redefining a void * to a pointer to an anonymous structure. What kind of a hack the "bizzare sabotage attempt" hints at?
The library is using information hiding on the internal data object behind the address held in an EFI_HANDLE. But in doing so, they're making the code more susceptible to accidental bugs.
In C, void* is transparently cast to any other non-void* non-const data pointer type without warning (it's by language design).
Using a non-void pointer type ensures an EFI_HANDLE is only used where EFI_HANDLE belongs. The compiler's type-checking kicks you in the groin when you pass it somewhere else that isn't EFI_HANDLE , but rather a pointer to something else.
Ex: As void*, this will compile without warning or error
#include <string.h>
#define EFI_HANDLE void*
int main()
{
EFI_HANDLE handle = NULL;
strcpy(handle, "Something");
}
Changing the alias to:
typedef struct {} *EFI_HANDLE;
will reap the ensuing "incompatible pointer type" compile-time error.
Finally, as an anonymous struct, there is no pointless structure tag name adding to the already-polluted name space that you can use (accidently or nefariously).
That isn't an anonymous structure, but a struct without a tag.
An anonymous structure can only exist as a member of another struct,
and it must also not have a tag1.
Defining a struct without any members is not allowed. The code you're looking at is using a compiler extension that permits this.
The library is doing this to hide the definition of the structure from the user, while maintaining type safety.
However there is a much better way to do this. If you have a hidden structure definition, you can still define an opaque pointer to it, that has a type, so it is type safe:
struct hidden //defined in a file and not exposed
{
int a;
};
void Hidden( struct hidden* );
void Other( struct other* );
struct hidden* a = NULL; //doesn't see the definition of struct hidden
Hidden( a ); //it may be used
Other( a ); //compiler error
1 (Quoted from: ISO/IEC 9899:201x 6.7.2.1 Structure and union specifiers 13)
An unnamed member whose type specifier is a structure specifier with no tag is called an
anonymous structure; an unnamed member whose type specifier is a union specifier with
no tag is called an anonymous union. The members of an anonymous structure or union
are considered to be members of the containing structure or union. This applies
recursively if the containing structure or union is also anonymous

OOP programming with data encapsulation in C

I tried to do data encapsulation in C based on this post here https://alastairs-place.net/blog/2013/06/03/encapsulation-in-c/.
In a header file I have:
#ifndef FUNCTIONS_H
#define FUNCTIONS_H
// Pre-declaration of struct. Contains data that is hidden
typedef struct person *Person;
void getName(Person obj);
void getBirthYear(Person obj);
void getAge(Person obj);
void printFields(const Person obj);
#endif
In ´functions.c´ I have defined the structure like that
#include "Functions.h"
enum { SIZE = 60 };
struct person
{
char name[SIZE];
int birthYear;
int age;
};
pluss I have defined functions as well.
In main.c I have:
#include "Functions.h"
#include <stdlib.h>
int main(void)
{
// Works because *Person makes new a pointer
Person new = malloc(sizeof new);
getName(new);
getAge(new);
getBirthYear(new);
printFields(new);
free(new);
return 0;
}
Is it true, that when I use Person new, new is already pointer because of typedef struct person *Person;.
How is it possible, that linker cannot see the body and members that I have declared in my struct person
Is this only possible using pointer?
Is the correct (and only) way to implement OOP prinicples in my case to make a different struct in functions.h like so:
typedef struct classPerson
{ // This data should be hidden
Person data;
void (*fPtrGetName)(Person obj);
void (*fPtrBirthYear)(Person obj);
void (*fPtrGetAge)(Person obj);
void (*fPtrPrintFields)(const Person obj);
} ClassPerson;
First of all, it is usually better to not hide pointers behind a typedef, but to let the caller use pointer types. This prevents all kinds of misunderstandings when reading and maintaining the code. For example void printFields(const Person obj); looks like nonsense if you don't realize that Person is a pointer type.
Have I understood correctly, that when I use Person new, new is already pointer because of typedef struct person *Person;.
Yes. You are confused because of the mentioned typedef.
How is it possible, that linker cannot see the body and members that I have declared in my ´struct person´?
The linker can see everything that is linked, or you wouldn't end up with a working executable.
The compiler however, works on "translation units" (roughly means a .c file and all its included headers). When compiling the caller's translation unit, the compiler doesn't see functions.c, it only sees functions.h. And in functions.h, the struct declaration gives an incomplete type. Meaning "this struct definition is elsewhere".
Is this only possible using pointer?
Yes, it is the only way if you want to do proper OO programming in C. This concept is sometimes called opaque pointers or opaque type.
(Though you could also achieve "poor man's private encapsulation" though the static keyword. Which is usually not really recommended, since it wouldn't be thread-safe.)
Is the correct (and only) way to implement OOP prinicples in my case to make a different struct in functions.h like so:
Pretty much, yeah (apart from the nit-pick about the mentioned pointer typedef). Using function pointers to the public functions isn't necessary though, although that's how you implement polymorphism.
What your example lacks though is a "constructor" and "destructor". Without them the code wouldn't be meaningful. The malloc and free calls should be inside those, and not done by the caller.
With or without typedef, in C you hide data by declaring incomplete types. In /usr/include/stdio.h, you'll find fread(3) takes a FILE * argument:
extern size_t fread (void *__restrict __ptr, size_t __size,
size_t __n, FILE *__restrict __stream) __wur;
and FILE is declared something like this:
struct _IO_FILE;
typedef struct _IO_FILE FILE;
Using stdio.h you cannot define a variable of type FILE, because type FILE is incomplete: it's declared, but not defined. But you can happily pass FILE * around, because all data pointers are the same size. You're just going to have to call fopen(3) to make it point to an open file.
To partially define a type, as in your case:
struct classPerson
{ // This data should be hidden
Person data;
void (*fPtrGetName)(Person obj);
...
};
is a little trickier. First of all, you should have a really good reason, namely that two implementations of fPtrGetName are implemented. Otherwise you're just building complexity on the altar of OOP.
A good example of a good reason is bind(2). You can bind a unix domain socket or a network socket, among others. Both types are represented by struct sockaddr, but that's just a stand-in type for struct sockaddr_un and struct sockaddr_in. Functions that take struct sockaddr depend on the fact that all such structures start with the member sun_family, and branch accordingly. Et voila, polymorphism: one function, many types.
For an example of a struct full of function pointers, I recommend looking at SQLite. Its API is loaded with structures to isolate it from the OS and let the user define plug-ins.
BTW, if I may say so, fPtrGetName is a terrible name. It's not interesting that it's a function pointer and (controversy!) "get" is noise on a function that takes no arguments. Compare
struct classPerson sargent;
sargent.fPtrGetName();
sargent.name();
Which would you rather use? I reserve "get" (or similar) for I/O functions; at least then you're getting something, not just moving it from one pocket to another! For setting, in C++ I overload the function, so that get/set functions have the same name, but in C I wind up with e.g. set_name(const char name[]).

C empty struct -- what does this mean/do?

I found this code in a header file for a device that I need to use, and although I've been doing C for years, I've never run into this:
struct device {
};
struct spi_device {
struct device dev;
};
and it used as in:
int spi_write_then_read(struct spi_device *spi,
const unsigned char *txbuf, unsigned n_tx,
unsigned char *rxbuf, unsigned n_rx);
and also here:
struct spi_device *spi = phy->spi;
where it is defined the same.
I'm not sure what the point is with this definition. It is in a header file for a linux application of the board, but am baffled by it use. Any explanations, ideas? Anyone seen this before (I'm sure some of you have :).
Thanks!
:bp:
This is not C as C structures have to contain at least one named member:
(C11, 6.7.2.1 Structure and union specifiers p8) "If the struct-declaration-list does not contain any named members, either directly or via an anonymous structure or anonymous union, the behavior is undefined."
but a GNU C extension:
GCC permits a C structure to have no members:
struct empty {
};
The structure has size zero
https://gcc.gnu.org/onlinedocs/gcc/Empty-Structures.html
I don't know what is the purpose of this construct in your example but in general I think it may be used as a forward declaration of the structure type. Note that in C++ it is allowed to have a class with no member.
In Linux 2.4 there is an example of an empty structure type with conditional compilation in the definition of spin_lock_t type alias in Linux kernel 2.4 (in include/linux/spinlock.h):
#if (DEBUG_SPINLOCKS < 1)
/* ... */
typedef struct { } spinlock_t;
#elif (DEBUG_SPINLOCKS < 2)
/* ... */
typedef struct {
volatile unsigned long lock;
} spinlock_t;
#else /* (DEBUG_SPINLOCKS >= 2) */
/* ... */
typedef struct {
volatile unsigned long lock;
volatile unsigned int babble;
const char *module;
} spinlock_t;
#endif
The purpose is to save some space without having to change the functions API in case DEBUG_SPINLOCKS < 1. It also allows to define dummy (zero-sized) objects of type spinlock_t.
Another example in the (recent) Linux kernel of an empty structure hack used with conditional compilation in include/linux/device.h:
struct acpi_dev_node {
#ifdef CONFIG_ACPI
void *handle;
#endif
};
See the discussion with Greg Kroah-Hartman for this last example here:
https://lkml.org/lkml/2012/11/19/453
This is not standard C.
C11: 6.2.5-20:
— A structure type describes a sequentially allocated nonempty set of member objects (and, in certain circumstances, an incomplete array), each of which has an optionally specified name and possibly distinct type.
J.2 Undefined behavior:
The behavior is undefined in the following circumstances:
....
— A structure or union is defined without any named members (including those
specified indirectly via anonymous structures and unions) (6.7.2.1).
GCC uses it as an extension (no more detailed is given there about when/where should it be used). Using this in any program will make it compiler specific.
One reason might to do this for a library is that the library developers do not want you to know or interfere with the internals of these struct. It these cases they may provide an "interface" version of the structs spi_device/device (which is what you may see) and have a second type definition that defines another version of said structs for use inside the library with the actual members.
Since you cannot access struct members or even create compatible structs of that type yourself with that approach (since even your compiler would not know the size actual size of this struct), this only works if the library itself creates the structs, only ever passes you pointers to it, and does not need you to modify any members.
If you add an empty struct as the first member of another struct, the empty
struct can serve as a "marker interface", i.e. when you cast a pointer to that
outer struct to a pointer of the inner struct and the cast succeeds you know
that the outer struct is "marked" as something.
Also it might just be a place holder for future development, not to sure. Hope this helps
This is valid C
struct empty;
struct empty *empty;
and facilitates use of addresses of opaque regions of memory.
Such addresses are usually obtained from and passed to library subroutines.
For example, something like this is done in stdio.h

Difference between using a structure member and cast a structure pointer when "emulate" polymorphism in C

I'm not sure if my wording is technically correct, so please correct me in both title and the main body of this question.
So basically my question is regarding emulating polymorphism in C. For example, suppose I have a tree, and there is a struct tree_node type. And I have some functions to help me insert nodes, delete nodes etc like this as an example:
void tree_insert(tree_node **root, tree_node *new_node);
Then I start to build other stuff for my app, and and need to use this tree to maintain, say, family members. But for human, I have another struct, let's call it "struct human_node" which is defined like this, for example:
typedef struct human_node_ {
tree_node t_node;
char *name;
} human_node;
Now apparently I want to use those tree utility functions I build for the generic tree. But they take tree_node pointers. Now time for the polymorphism emulation. So here are the two options I have, one is to cast my human_node, one is to use the t_node member in the human_node:
human_node *myfamily_tree_root, *new_family_guy;
//some initialization code and other code later...
tree_insert((tree_node **)&myfamily_tree_root, &(new_family_guy->t_node));
For concise I put both ways in one function call above.
And this is exactly where I have my confusion. So which one should I use and more importantly, why?
Both are standard, but in general if you can avoid type casts then you should pick the solution that avoids the type casts.
A common thing for such inline data structure implementations is to not even require that the tree node (or equivalent) is the first element in the struct since you might want to enter your nodes into multiple trees. Then you definitely want to use the second approach. To convert between the tree_node element and the struct containing you'll have to have some black magic macros, but it's worth it. For example in an implementation of avl trees I have these macros:
#ifndef offsetof
#define offsetof(s, e) ((size_t)&((s *)0)->e)
#endif
/* the bit at the end is to prevent mistakes where n is not an avl_node */
#define avl_data(n, type, field) ((type *)(void*)((char *)n - offsetof(type, field) - (n - (struct avl_node *)n)))
So I can have something like:
struct foo {
int data;
struct avl_node tree_node_1;
struct avl_node tree_node_2;
};
int
tree_node_1_to_data(struct avl_node *x)
{
return avl_data(x, struct foo, tree_node_1)->data;
}
If you choose to make your code this generic you definitely want to take references to your tree_node members and not typecast the pointer to the struct.
This question is probably too broad for any specific answers, but you could for instance see how CPython does it.
Basically, all Python structs have the same header, and to define your own types you must be sure to start your struct with the PyObject_HEAD macro (or PyObject_VAR_HEAD for variably sized objects like strings). This adds stuff like a type tag, reference counts, and so on.
After instantiating objects, you pass them around as PyObject*s, and the functions will infer what type the object actually is (e.g. a string, a list, etc.) and be able to dispatch based on that. Yes, you have to type cast at some point to get to your actual object contents.
For instance, this is how Python's character strings are defined:
typedef struct {
PyObject_VAR_HEAD
Py_hash_t ob_shash;
char ob_sval[1];
/* Invariants:
* ob_sval contains space for 'ob_size+1' elements.
* ob_sval[ob_size] == 0.
* ob_shash is the hash of the string or -1 if not computed yet.
*/
} PyBytesObject;
You can read more about the CPython's type object inheritance model. Extract:
Objects are always accessed through pointers of the type 'PyObject *'.
The type 'PyObject' is a structure that only contains the reference
count and the type pointer. The actual memory allocated for an object
contains other data that can only be accessed after casting the
pointer to a pointer to a longer structure type.
Note that this angle of attack may be more suited for interpreted code. There are probably other open source projects you may look at that are more directly relevant to your needs.

Resources