Opaque structure with flexible array member - c

Suppose I have a struct declaration in a header file like:
event.h
struct event_t;
and in the corresponding C file I would like to sort-of alias it with the Linux-specific struct inotify_event. The problem is that struct inotify_event contains flexible array member:
struct inotify_event {
int wd;
uint32_t mask;
uint32_t cookie;
uint32_t len;
char name[];
};
As per 6.7.2.1(p3) (emphasize mine):
A structure or union shall not contain a member with incomplete or
function type (hence, a structure shall not contain an instance of
itself, but may contain a pointer to an instance of itself), except
that the last member of a structure with more than one named member
may have incomplete array type; such a structure (and any union
containing, possibly recursively, a member that is such a structure)
shall not be a member of a structure or an element of an array.
it is not possible to define the struct event_t as
struct event_t{
struct inotify_event base; //Non-conforming
};
So I could convert struct event_t * to struct inotify_event *. Since the 6.7.2.1(p3) concerns only about structs the solution I see is to redeclare the tag name as
union event_t
and then define it later as a single element union.
union event_t{
struct inotify_event event; //Conforming?
};
The only requirement the Standard imposes on union that I found is that the set of members of a union must be non-empty 6.2.5(p20) (emphasize mine):
A union type describes an overlapping nonempty set of member objects,
each of which has an optionally specified name and possibly distinct
type.
QUESTION: Is it a conforming/common way to hide an implementation details of some specific data structure through union?

This is how I would do it:
event.h
struct event_t;
event_t *create_event(void);
void free_event(event_t *ev);
event.c
#include "event.h";
event_t *create_event(void)
{
inotify_event *iev = ...;
return (event_t *)iev;
}
void free_event(event_t *ev)
{
inotify_event *iev = (inotify_event *)ev;
// free the event
}
However, if you want to store additional data with the event then:
event.h
struct event_t;
event_t *create_event(void);
void free_event(event_t *ev);
event.c
#include "event.h";
struct event_t
{
inotify_event *iev;
// additional data
};
event_t *create_event(void)
{
inotify_event *iev = ...;
event_t *ev = malloc(sizeof(event_t));
ev.iev = iev;
return ev;
}
void free_event(event_t *ev)
{
inotify_event *iev = (inotify_event *)ev.iev;
// free the event (iev) first
free(ev);
}
If you have multiple implementations that you need to hide in event_t then:
enum event_type
{
EVENT_TYPE_INOTIFY,
EVENT_TYPE_INOTIFY2,
};
struct event_t
{
event_type type;
union {
inotify_event *iev; // you use this when type == EVENT_TYPE_INOTIFY
inotify_event2 *iev2; // you use this when type == EVENT_TYPE_INOTIFY2
}
// additional data
};

By far the simplest technique is to put this into your event.h header:
typedef struct inotify_event event_t;
This declares that there is a structure type struct inotify_event and declares an alias for it event_t. But it does not define the content of struct inotify_event at all.
Only the implementation code in event.c includes the definition of struct inotify_event from the system header; everything else does not include that header and cannot access the elements of an event_t except through the accessor API you define.
You can enforce this separation of duties by code review — or by checking with grep, or other similar techniques — to ensure that no code except the implementation of your event type uses the system header for inotify_event. And, if you port to a system other than Linux without support for inotify, then you simply provide an alternative opaque structure type in place of struct inotify_event in your event.h header.
This avoids all questions about whether there are flexible array members within structures, etc; it is all a non-issue.
Note the Q&A about What does a type followed by _t (underscore t) represent? . Be cautious about creating your own types with the _t suffix ¸— consider using a prefix on such type names that gives you a chance that your names will be distinct from those provided by the system.

Single element union makes no sense. The purpose of union is to serve as a kind of polymorphic struct. struct members are accessed by offset, this is why is impossible to put an incomplete struct or array in the middle of a struct.
For example
struct foo { int a; int b[]; int c; };
In this example is impossible for the compiler to determine the address of c because b size can vary at runtime. But if you put incomplete array at the end all struct members address can be determined by the address of the start of the struct. Keep in mind that pointers are just address, so you can have any pointers to any structs and all the offsets can be determined, but you will need to deal with extra alloc/free stuff.
When you create an union you telling to compiler Hey! I have this members, reserve enough space for me so that I can treat this variable as foo or bar. In another words, the compiler will take the largest union member and this will be the size of the union. A common use for union is for representing multiple kinds of values.
typedef union { int integer, float real, char *string } value_type;
This way you can treat value_type as int, float, or a char pointer. You're code need to know the how to treat each member but the compiler will make sure that when you do malloc(sizeof value_type) you have enough space for the tree types.
Now your problem. You want to hide implementation details. Usually this is done by declaring a type or struct incompletely in a header, and completely only on your object files. Because of this when the user include your header all the information that the compiler has is struct my_struct;. It can't tell the size of my_struct so is impossible for you to allocate it as malloc(sizeof struct my_struct). Also since the user hasn't the member definitions it can't mess up with the struct internals.
Working like this you will need to provide user with functions for allocating and freeing my_struct, for example struct my_struct *foo = my_struct_new() and my_struct_destroy(foo).
You're already doing this. To deal with the struct inotify problem I would do one of these.
(1) Surround OS specific with #ifdef for that OS, so that the event_t has only the right members defined depending on the operating system. You will need #ifdef on your functions. This has the advantage to keep useless code out of final binary, so smaller footprint.
(2) Have pointers to OS specific structs and let runtime decide what to do. This easier to maintain.

Related

Accessing first field of struct in a union of structs

I have three structs that share the first type and name of the first field:
struct TYPEA {
char *name;
int x,y; /*or whatever*/
};
struct TYPEB {
char *name;
float a[30]; /*or whatever*/
};
struct TYPEC {
char *name;
void *w,*z; /*or whatever*/
};
If I remember correctly the first field of a struct is required to start at the same address as the struct itself.
This makes me wonder if the same holds for a union:
union data {
struct TYPEA;
struct TYPEB;
struct TYPEC;
};
union data *p = function_returning_a_sane_default_for_union_data();
printf("%s", (char*) p);
I have 2 question regarding this:
are unions required by standard
to always have their content at the same address?
would it work if the structs all had the same
fiels, so differing only in name?
The first element of a struct or union is guaranteed to have the same address-value as the struct´/union itself. Apparently it has not the same type!
For your usage, you do not need the cast and actually should avoid it:
6.5.2.3p6: One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. …
So you could (see below) simply
printf("%s", p->name);
(Note: that your usage of unnamed union members is not standard compilant. It is a (very useful) gcc extension (-fms-extensions, at least also supported by MSVC, too).)
But: The code in your question is wrong. You have to name each union member or have the type declarator with each member. With the same first member, It will not weork, though, because the names of the mebers of such unnamed members have to be unique (how else are they supposed to be accesseed individually?). So this will not really work. What you could do is:
union data {
struct TYPEA typea;
struct TYPEB typeb;
struct TYPEC typec;
};
and
printf("%s", p->typea.name);
even if the struct contains a value of TYPEB currently.
An alternative and more clear way, would be to wrap the union into a struct:
struct TypeA {
int x,y;
};
...
struct data {
char *name;
union {
struct TypeA;
struct TypeB;
struct TypeC;
};
};
This also uses the gcc extension at two levels: for the outer struct and the union. As such, it requires unique names for all possible paths. If you want to be 100% compliant, name each member like above and use the full path on access.
Note: I removed the name member from the inner structs in the union and moved it to the outer struct. I also changed names. The only well accepted naming convention in C is to use all-uppercase for macros only.

C structure multiple types

I'd like to write a library in C and I don't know what is the recommended way. I got for example structure and multiple functions like this:
typedef struct example
{
int *val;
struct example *next;
} Example;
and I have build function for multiple types of val
Example* build() { do sth };
Example* buildf() { do sth }; // val is float
Example* buildd() { do sth }; // val is double
What is the better practice (used in "professional" library). Use pointer to void and casting or have structure for all possibilities - int, float, double.
Use a union and some way to store type info:
typedef struct example
{
enum{ T_STRUCT_WITH_INT, T_STRUCT_WITH_FLOAT, T_SO_ON } type;
union {
int val_int;
float val_float;
} val;
struct example *next;
} Example;
Access fields after checking type by s->val.val_int
In C11 you can have union anonymous and fields can be accessed like s->val_int
This is primarily based on some combination of opinion, experience and the specific requirements at hand.
The following approach is possible, inspired by some container library work by Jacob Navia. I've never used it myself:
struct container_node {
struct container_node *link_here, *link_there, *link_elsewhere;
/*...*/
char data[0]; /* C90 style of "flexible array member" */
};
struct container_node *container_node_alloc(size_t data_size);
The allocation function allocates the node large enough so that data[0] through data[data_size-1] bytes of storage are available. Through another set of API functions, user data of arbitrary type be copied in and out.
The following approach is sometimes called "intrusive container". The container defines only a "base class" consisting of the link structure. The user must embed this structure into their own structure:
struct container_node {
struct container_node *next, *prev;
};
void container_insert(struct container *container, struct container_node *n);
struct container_node *container_first(struct container *container);
The user does this:
struct my_widget {
struct container_node container_links;
int widget_height;
/* ... */
};
/* .... */
/* We don't insert my_widget, but rather its links base. */
container_insert(&widg_container, &widget->container_links);
Some macros are used to convert between a pointer to the widget and a pointer to the container links. See the container_of macro used widely in the Linux kernel:
struct my_widget *wptr = container_of(container_first(&widg_container),
struct my_widget, container_links);
See this question.
Then there approaches of storing a union in each node, which provides an integer, floating-point-value or a pointer. In that case, the data is separately allocated (though not necessarily: if the caller controls the allocation of the nodes, it's still possible to put the node structure and the user data in a buffer that came from a single malloc call).
Finally, there are also approaches which wrap these techniques with preprocessor templating, an example of which are the BSD QUEUE macros.

Why are typedef names used twice in struct declaration in C?

While researching queues in C, I came across an example similar to the below. Why is the struct named both at the beginning of the curly braces and after? Why is struct type used again inside of the struct when adding an item of the same type? Are these things redundant or is there a point?
typedef void* vpoint_t;
typedef struct queue_item_t{
vpoint_t void_item;
struct queue_item_t* next;
} queue_item_t;
typedef struct queue_item_t { // 1
vpoint_t void_item;
struct queue_item_t* next; // 2
} queue_item_t; // 3
First of all, note that this entire statement is defining a typedef.
3) Is saying that the new type we are in the process of defining (via typedef) is going to be named queue_item_t.
1) The name of the structure (which is being given a new name, as we go), is named struct queue_item_t. That's it's full name, including struct at the front.
2) Because the new type doesn't yet exist (remember, we're still in the process of defining it), we have to use the only name it has thus far, which is struct queue_item_t, from 1).
Note that you can have anonymous struct definitions, which allow you to omit the name from 1). A simple example:
typedef struct {
int x, y, z;
} vector3;
In your example however, since we need the structure to be able to refer to itself, the next pointer must have a type that's already defined. We can do that by forward declaring the struct, typedefing it, then defining the struct using the typedefd type for next:
struct _queue_item; // 4
typedef struct _queue_item queue_item_t; // 5
struct _queue_item { // 6
vpoint_t void_item;
queue_item_t* next; // 7
}
4) Declare that struct _queue_item exists, but don't yet provide a definition for it.
5) Typedef queue_item_t to be the same as struct _queue_item.
6) Give the definition of the structure now...
7) ...using our typedef'd queue_item_t.
All that being said... In my opinion, please don't use typedefs for structs.
struct queue_item {
void *data;
struct queue_item *next;
}
is simple and complete. You can manage to type those six extra characters.
From the Linux Kernel coding style:
Chapter 5: Typedefs
Please don't use things like "vps_t".
It's a mistake to use typedef for structures and pointers. When you see a
vps_t a;
in the source, what does it mean?
In contrast, if it says
struct virtual_container *a;
you can actually tell what "a" is.
There are exceptions, which you can read about.
Some recent related questions:
Why use an opaque “handle” that requires casting in a public API rather than a typesafe struct pointer?
Does casting a pointer back and forth from size_t or uintptr_t break strict aliasing?
Let's change the declaration a little bit to make the discussion easier to follow:
typedef struct queue_item {
vpoint_t void_item;
struct queue_item* next;
} QueueItemType;
C supports several different name spaces; one name space is reserved for tag names on unions, structures, and enumeration types. In this case, the tag name is queue_item. Another name space is reserved for regular identifers, including typedef names like QueueItemType.
The next member is being used to point to another instance of type struct queue_item (i.e., the next item in the queue). It's declared as a pointer to struct queue_item for two reasons:
A struct type cannot contain an instance of itself; for one thing, the type would have to be infinitely large (struct queue_item contains a member next, which is a struct queue_item that contains a member next, which is a struct queue_item that contains a member next, ad infinitum);
The struct type definition isn't complete until the closing }, and you can't declare an instance of an incomplete type. However, you can declare a pointer to an incomplete type, which we do below:
struct queue_item *next;
Why not use QueueItemType *next; instead of struct queue_item *next? Again, the struct type definition isn't complete at the point next is being declared; the typedef name QueueItemType doesn't exist yet. However, the tag name queue_item is already visible to the compiler, so we can declare pointers using the type struct queue_item.
Since tag names and typedef names occupy different name spaces, it's possible to use the same name for both the tag name and the typedef name without a collision. The compiler disambiguates between the two by the presence of the struct keyword.
First, I suggest to NEVER use a struct name and identical type name. Use typedef struct QUEUE_ITEM {...} queue_item_t;
As to the question: if you want to make a "recursive data structure", that is, a data structure that has pointers to instances of itself, then you must be able to tell the compiler "This field is a pointer to one of ourselves. You don't know what we look like yet completely, because I am still defineing it, so just reserve space for a pointer". To do that you declare
struct T {
...
struct T *ptr;
....
};
With the final } queue_item_t; you create a new name for the structure.
"struct foo {...} " is one thing. It defines a struct, you need to type "struct foo" in every place you use it.
"typedef ... foo" defines a new type, so you just type "foo" where you use it.
"typedef struct foo {...} foo" is an idiom so you can use both, most probably just "foo" to save keystrokes and visual pollution.

Why does "struct T* next" compile when T isn't an existing type?

I am using MinGW on Windows. I am building linked list and I am confused with this.
#include <stdio.h>
#include <stdlib.h>
typedef struct Data
{
int x;
int y;
struct BlaBla * next; /*compiles with no problem*/
}List;
int main(void)
{
List item;
List * head;
head = NULL;
return 0;
}
I now that struct can't have struct variable(object, instance of that struct), but can have pointer of that struct type. Didn't know that pointer can be pointer of unexisting type. struct BlaBla * next;(not for linked list, it must be struct Data * next but mean general talking)
Yes, you can, because then the compiler, upon encountering the unknown type name for the first time, assumes that there's somehwere a struct type definition with this name. Then it will forward-declare the struct name for you, let you use it as a pointer, but you can't dereference it nor can you do pointer arithmetic on it (since it's an incomplete type).
The compiler will accept code such as your example:
typedef struct Data
{
int x;
int y;
struct BlaBla * next; /*compiles with no problem*/
}List;
This is okay because the size of pointers is known to the compiler, and the compiler is assuming that the struct will be defined before it is dereferenced.
Because the compiler acts this way, it's possible to do this:
typedef struct Data
{
int x;
int y;
struct Data * next; /* points to itself */
} List;
However, if you were to include the struct inline, like this:
typedef struct Data
{
int x;
int y;
struct BlaBla blaStruct; /* Not a pointer. Won't compile. */
}List;
The compiler can't work out how big struct Data is because it doesn't know how big struct BlaBla is. To get this to compile, you need to include the definition of struct BlaBla.
Note that, as soon as you need to access the members of struct BlaBla, you will need to include the header file that defines it.
It depends on what you mean by "unexisting". If you haven't even declared BlaBla, you'll get an error.
If you've declared it but not yet defined it, that will work fine. You're allowed to have pointers to incomplete types.
In fact, that's the normal way of doing opaque pointers in C.
So, you might think that this is invalid because there's no declaration of struct BlaBla in scope:
typedef struct Data {
struct BlaBla *next; // What the ??
} List;
However, it's actually okay since it's both declaring struct BlaBla and defining next at the same time.
Of course, since definition implies declaration, this is also okay:
struct BlaBla { int xyzzy; };
typedef struct Data {
struct BlaBla *next; // What the ??
} List;
In order to declare a variable or field of a given type, pass one as a parameter, or copy one to another of the same type, the compiler has to know how many bytes the variable or field occupies, what alignment requirements it has (if any), and what other pointer types it's compatible with, but that's all the compiler needs to know about it. In all common dialects of C, a pointer to any structure will always be the same size and require the same alignment, regardless of the size of the structure to which it points or what that structure may contain, and pointers to any structure type are only compatible with other pointers to the same structure type.
Consequently, code which doesn't need to do anything with pointers to a structure except allocate space to hold the pointers themselves [as opposed to the structures at which they point], pass them as parameters, or copy them to other pointers, doesn't need to know anything about the structure type to which they point beyond its unique name. Code which needs to allocate space for a structure (as opposed to a pointer to one) or access any of its members must know more about its type, but code which doesn't do those things doesn't need such information.

arrays of not yet defined types (incomplete element type)

Is it possible to make an array of declared but not defined types? This is what I would like to do:
typedef struct _indiv indiv;
typedef indiv pop[];
and let somebody else decide what an individual's members actually are by defining the struct _indiv in another .c or .h file (and then linking everything together).
(For the semantics, indiv is an individual and pop is a population of individuals.)
But the compiler complains:
error: array type has incomplete element type
I could replace the second typedef by
typedef indiv * pop;
And use pop like an array by accessing the elements like p[i] (with p of type pop), but if I do that the compiler will complain that
error: invalid use of undefined type ‘struct _indiv’
error: dereferencing pointer to incomplete type
I suppose since typedef struct _indiv indiv is only a declaration, the compiler does not know at compile time (before the linkage) how much space the struct requires and that it doesn't like it, thus forbiding to do what I'm trying. But I would like to know why and if there is a possible way to acheive what I want.
Thanks
If you want this source file to manipulate items of type indiv, then you have 2 choices.
1) Declare the structure, but don't define it. Use only pointers to the structure. Never dereference them:
struct _indiv;
typedef struct _indiv indiv;
typedef indiv * pop;
//sizeof(_indiv) is not known, how many bytes should we allocate?
pop p = malloc(N*unknownSize);
//this line will fail because it does not know how many bits to copy.
p[0] = getIndiv();
2) define the complete structure:
struct _indiv
{
int id;
char* name;
/*...*/
};
typedef struct _indiv indiv;
typedef indiv * pop;
pop p = malloc(N*sizeof(indiv));
//Now this line can work.
p[0] = getIndiv();
The suggestion to define a dummy 'indiv' is a bad one:
--- file1.c
struct _indiv
{
char dummy;
};
typedef struct _indiv indiv;
typedef indiv * pop;
pop p = malloc(N*sizeof(indiv)); //this will allocate N bytes.
//This will generate code that copies one byte of data.
p[0] = getIndiv();
---realIndiv.c
typedef struct _indiv
{
int id;
char* name;
/*...*/
} indiv;
indiv getIndiv();
{
indiv i = /* whatever */;
return i; //this will return 8+ bytes.
}
When you do this, the first file will be manipulating a differently sized item than the "real" indiv struct, and you are sure to get unexpected behaviour.
You are right that the compiler doesn't know the size of incomplete types (in your example, struct _indiv is an incomplete type), which is why you cannot declare a variable of such a type. This includes creating an array of such types.
However, this doesn't really matter, because if you don't have the complete definition of the type, then you can't sensibly access its members anyway: if you write p[i].foo, how do you know if the type actually has a member called foo, and if it does, what type it is?
If you want the struct type's members to be defined in another .c file (this is known as an "opaque type"), then you must only ever create and handle pointers to the struct. Your other .c should contain all the code that actually accesses the struct itself. The file that has only the incomplete type would contain code like:
indiv *i1, *i2;
i1 = new_individual("foo"); /* Create an individual */
i2 = new_individual("bar");
print_individual(i1);
...and the source file with the complete definition of the struct would contain the implementation of new_individual(), print_individual() and so on.
Under this scheme, the easiest way to deal with a population is to make it an array of pointers to indiv structs.
You can only define array of pointers to an undefined type, because you don't know size of that type.
Note that in C language you can define the same struct differently in many places. You can use this technique: Simply define your struct anyhow, then you can freely define and use pointers to that type. And then define the real struct with the same name somewhere else. Also you get the same effect when you simply use arrays of void*.

Resources