Recently I started to look into some operating system's source code, there is a special coding technique which puzzles me a lot.
First the source code declare a very basic struct, such as:
struct cmd {
int type;
};
And then, it continue to declare several other structs which contain the first basic struct at their beginning:
struct execcmd {
int type; //Here.
char *argv[MAXARGS];
char *eargv[MAXARGS];
};
struct redircmd {
int type; //Here.
struct cmd *cmd;
char *file;
char *efile;
int mode;
int fd;
};
Because the identity in the first few bytes of these structs, we are able to access the shared int type part even though we are not sure of which exactly the structure it is. And we can use the int type part to cast the struct pointer to the correct one:
void runcmd(struct cmd *cmd)
{
switch(cmd->type){
case EXEC:
ecmd = (struct execcmd*)cmd;
case REDIR:
rcmd = (struct redircmd*)cmd;
break;
case LIST:
lcmd = (struct listcmd*)cmd;
break;
case PIPE:
pcmd = (struct pipecmd*)cmd;
break;
case BACK:
bcmd = (struct backcmd*)cmd;
break;
}
So my question is, what is the name and benefit of this techinique, or, what is the normal use case for this technique?
This is known as a "common initial sequence". Per 6.5.2.3 Structure and union members, paragraph 6 of the C11 standard:
One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.
Strictly speaking, the code you have posted is incorrect as the struct members are not used via a common union.
This technique is used any time you're storing some data that may have an arbitrary type from some set of types. This is what could be called a variant type.
Suppose you were writing a parser for mathematical expressions - building an AST (abstract syntax tree). You'd want each node in the tree to be able to be handled generically by some code that for example can serialize and deserialize the tree. The generic code could use the type tag to call the type-specific serialization/deserialization method (also called a virtual observer method). The observers would then cast the node to a concrete "derived" type, and use that to operate on it.
enum NodeType { NodeA, NodeB };
struct Node {
enum NodeType type;
} typedef Node;
typedef void (Observer*)(Node *node, void *context);
void serializeNodeA(Node *node, void *context);
void deserializeNodeA(Node *node, void *context);
void serializeNodeB(Node *node, void *context);
void deserializeNodeB(Node *node, void *context);
struct VirtualMethods {
Observer serialize;
Observer deserialize;
} typedef VirtualMethods;
const VirtualMethods vtables[] = {
{{serializeNodeA, deserializeNodeA},
{serializeNodeB, deserializeNodeB}};
void serializeNode(Node *node, void *context) {
int type = node->type;
Observer serialize = vtables[type].serialize;
serialize(node, context);
}
void deserializeNode(Node *node, void *context) {
int type = node->type;
Observer deserialize = vtables[type].deserialize;
deserialize(node, context);
}
There are other applications as well, of course.
Using the type integer tag to select a virtual function table saves space compared to directly storing a virtual function table pointer, and is more flexible since to compare types you don't need to compare pointers to tables.
Related
After passing a void* pointer as argument to a function, is there a way to specify the type to which it is cast as another parameter. If I have two structs like:
struct A{
int key;
char c;
}
struct B {
int key;
float d;
}
Is it possible to define a function,
void func(void * ptr, ...){
//operate on key
}
and pass a pointer to either structs to the function after casting to void* and access the key element from within the function.
Trying to understand the use of void*, how structure definitions are stored ( How are the offsets of various elements determined from the structure definition? ) and how ploymorphism may be implemented in c.
Was trying to see if I could write Binary Search tree functions that could deal with nodes of any struct.
After passing a void* pointer as argument to a function, is there a way to specify the type to which it is cast as another parameter.
Yes and no.
I suppose you're hoping for something specific to this purpose, such as a variable that conveys a type name that the function can somehow use to perform the cast. Something along the lines of a type parameter in a C++ template, or a Java generic method, for example. C does not have any such thing.
But of course, you can use an ordinary integer to convey a code representing which of several known-in-advance types to cast to. If you like, you can even use an enum to give those codes meaningful names. For example:
enum arg_type { STRUCT_A_TYPE, STRUCT_B_TYPE };
void func(void *ptr, enum arg_type type) {
int key = 0;
switch (type) {
case STRUCT_A_TYPE:
key = ((struct A *) ptr)->key;
break;
case STRUCT_B_TYPE:
key = ((struct B *) ptr)->key;
break;
default:
assert(0);
}
// ...
}
Note well that that approach allows accessing any member of the pointed-to structure, but if you only want to access the first member, and it has the same type in every structure type of interest, then you don't need to know the specific structure type. In that particular case, you can cast directly to the member type:
void func(void *ptr) {
int key = *(int *)ptr;
// ...
}
That relies on C's guarantee that a pointer to any structure, suitably cast, points to that structure's first member.
Trying to understand the use of void*, how structure definitions are store and how ploymorphism may be implemented in c.
That's awfully broad.
C does not offer polymorphism as a language feature, and C objects do not carry information about their type such as could be used to dispatch type-specific functions. You can, of course, implement that yourself, but it is non-trivial. Available approaches include, but are not limited to,
passing pointers to functions that do the right thing for the type of your data. The standard qsort() and bsearch() functions are the canonical examples of this approach.
putting some kind of descriptor object as the first member of every (structure) type. The type of that member can be a structure type itself, so it can convey arbitrarily complex data. Such as a vtable. As long as it is the first member of all your polymorphic structures, you can always access it from a pointer to one of them by casting to its type, as discussed above.
Using tagged unions of groups of polymorphic types (requiring that all the type alternatives in each group be known at build time). C then allows you to look at any members of the common initial sequence of all union members without knowing which member actually has a value. That initial sequence would ordinarily include the tag, so that you don't have to pass it separately, but it might include other information as well.
Polymorphism via (single-)inheritance can be implemented by giving each child type an object of its parent type as its first member. That then allows you to cast to (a pointer to) any supertype and get the right thing.
Lets say you had a sort function that takes a function as a parameter which implements the "compare" functionality of the sort. The sort would then be capable of sorting a list of any arbitrary struct, by handing it a comparer function that implements the correct order for your particular struct.
void bubbleSort(Node* start, bool comparerFunction(void* a, void* b))
Consider the following struct definition:
typedef struct {
int book_id;
char title[50];
char author[50];
char subject[100];
char ISBN[13];
} Book;
And this unremarkable linked list definition:
typedef struct node{
void* item;
struct node* next;
} Node;
Which can store an arbitrary struct in the item member.
Because you know the type of the members you've placed in your linked list, you can write a comparer function that will do the right thing:
bool sortByTitle(void* left, void* right) {
Book* a = (Book*)left;
Book* b = (Book*)right;
return strcmp(a->title, b->title) > 0;
}
And then call your sort like this:
bubbleSort(myList, sortByTitle);
For completeness, here is the bubbleSort implementation:
/* Bubble sort the given linked list */
void bubbleSort(Node *start, bool greaterThan(void* a, void* b))
{
int swapped, i;
Node* ptr1;
Node* lptr = NULL;
/* Checking for empty list */
if (start == NULL)
return;
do
{
swapped = 0;
ptr1 = start;
while (ptr1->next != lptr)
{
if (greaterThan(ptr1->item, ptr1->next->item))
{
swap(ptr1, ptr1->next);
swapped = 1;
}
ptr1 = ptr1->next;
}
lptr = ptr1;
}
while (swapped);
}
/* function to swap data of two nodes a and b*/
void swap(Node *a, Node *b)
{
void* temp = a->item;
a->item = b->item;
b->item = temp;
}
I'm coming from Java and I'm trying to implement a doubly linked list in C as an exercise. I wanted to do something like the Java generics where I would pass a pointer type to the list initialization and this pointer type would be use to cast the list void pointer but I'm not sure if this is possible?
What I'm looking for is something that can be stored in a list struct and used to cast *data to the correct type from a node. I was thinking of using a double pointer but then I'd need to declare that as a void pointer and I'd have the same problem.
typedef struct node {
void *data;
struct node *next;
struct node *previous;
} node;
typedef struct list {
node *head;
node *tail;
//??? is there any way to store the data type of *data?
} list;
Typically, the use of specific functions like the following are used.
void List_Put_int(list *L, int *i);
void List_Put_double(list *L, double *d);
int * List_Get_int(list *L);
double *List_Get_double(list *L);
A not so easy for learner approach uses _Generic. C11 offers _Generic which allows for code, at compile time, to be steered as desired based on type.
The below offers basic code to save/fetch to 3 types of pointers. The macros would need expansion for each new types. _Generic does not allow 2 types listed that may be the same like unsigned * and size_t *. So there are are limitations.
The type_id(X) macros creates an enumeration for the 3 types which may be use to check for run-time problems as with LIST_POP(L, &d); below.
typedef struct node {
void *data;
int type;
} node;
typedef struct list {
node *head;
node *tail;
} list;
node node_var;
void List_Push(list *l, void *p, int type) {
// tbd code - simplistic use of global for illustration only
node_var.data = p;
node_var.type = type;
}
void *List_Pop(list *l, int type) {
// tbd code
assert(node_var.type == type);
return node_var.data;
}
#define cast(X,ptr) _Generic((X), \
double *: (double *) (ptr), \
unsigned *: (unsigned *) (ptr), \
int *: (int *) (ptr) \
)
#define type_id(X) _Generic((X), \
double *: 1, \
unsigned *: 2, \
int *: 3 \
)
#define LIST_PUSH(L, data) { List_Push((L),(data), type_id(data)); }
#define LIST_POP(L, dataptr) (*(dataptr)=cast(*dataptr, List_Pop((L), type_id(*dataptr))) )
Usage example and output
int main() {
list *L = 0; // tbd initialization
int i = 42;
printf("%p %d\n", (void*) &i, i);
LIST_PUSH(L, &i);
int *j;
LIST_POP(L, &j);
printf("%p %d\n", (void*) j, *j);
double *d;
LIST_POP(L, &d);
}
42
42
assertion error
There is no way to do what you want in C. There is no way to store a type in a variable and C doesn't have a template system like C++ that would allow you to fake it in the preprocessor.
You could define your own template-like macros that could quickly define your node and list structs for whatever type you need, but I think that sort of hackery is generally frowned upon unless you really need a whole bunch of linked lists that only differ in the type they store.
C doesn't have any runtime type information and doesn't have a type "Type". Types are meaningless once the code was compiled. So, there's no solution to what you ask provided by the language.
One common reason you would want to have a type available at runtime is that you have some code that might see different instances of your container and must do different things for different types stored in the container. You can easily solve such a situation using an enum, e.g.
enum ElementType
{
ET_INT; // int
ET_DOUBLE; // double
ET_CAR; // struct Car
// ...
};
and enumerate any type here that should ever go into your container. Another reason is if your container should take ownership of the objects stored in it and therefore must know how to destroy them (and sometimes how to clone them). For such cases, I recommend the use of function pointers:
typedef void (*ElementDeleter)(void *element);
typedef void *(*ElementCloner)(const void *element);
Then extend your struct to contain these:
typedef struct list {
node *head;
node *tail;
ElementDeleter deleter;
ElementCloner cloner;
} list;
Make sure they are set to a function that actually deletes resp. clones an element of the type to be stored in your container and then use them where needed, e.g. in a remove function, you could do something like
myList->deleter(myNode->data);
// delete the contained element without knowing its type
create enum type, that will store data type and alloc memory according to this enum. This could be done in switch/case construction.
Unlike Java or C++, C does not provide any type safety. To answer your question succinctly, by rearranging your node type this way:
struct node {
node* prev; /* put these at front */
node* next;
/* no data here */
};
You could then separately declare nodes carrying any data
struct data_node {.
data_node *prev; // keep these two data members at the front
data_node *next; // and in the same order as in struct list.
// you can add more data members here.
};
/* OR... */
enter code here
struct data_node2 {
node node_data; /* WANING: this may look a bit safer, but is _only_ if placed at the front.
/* more data ... */
};
You can then create a library that operates on data-less lists of nodes.
void list_add(list* l, node* n);
void list_remove(list* l, node* n);
/* etc... */
And by casting, use this 'generic lists' api to do operation on your list
You can have some sort of type information in your list declaration, for what it's worth, since C does not provide meaningful type protection.
struct data_list
{
data_node* head; /* this makes intent clear. */
data_node* tail;
};
struct data2_list
{
data_node2* head;
data_node2* tail;
};
/* ... */
data_node* my_data_node = malloc(sizeof(data_node));
data_node2* my_data_node2 = malloc(sizeof(data_node2));
/* ... */
list_add((list*)&my_list, (node*)my_data_node);
list_add((list*)&my_list2, &(my_data_node2->node_data));
/* warning above is because one could write this */
list_add((list*)&my_list2, (node*)my_data_node2);
/* etc... */
These two techniques generate the same object code, so which one you choose is up to you, really.
As an aside, avoid the typedef struct notation if your compiler allows, most compilers do, these days. It increases readability in the long run, IMHO. You can be certain some won't and some will agree with me on this subject though.
I'd like to write a library in C and I don't know what is the recommended way. I got for example structure and multiple functions like this:
typedef struct example
{
int *val;
struct example *next;
} Example;
and I have build function for multiple types of val
Example* build() { do sth };
Example* buildf() { do sth }; // val is float
Example* buildd() { do sth }; // val is double
What is the better practice (used in "professional" library). Use pointer to void and casting or have structure for all possibilities - int, float, double.
Use a union and some way to store type info:
typedef struct example
{
enum{ T_STRUCT_WITH_INT, T_STRUCT_WITH_FLOAT, T_SO_ON } type;
union {
int val_int;
float val_float;
} val;
struct example *next;
} Example;
Access fields after checking type by s->val.val_int
In C11 you can have union anonymous and fields can be accessed like s->val_int
This is primarily based on some combination of opinion, experience and the specific requirements at hand.
The following approach is possible, inspired by some container library work by Jacob Navia. I've never used it myself:
struct container_node {
struct container_node *link_here, *link_there, *link_elsewhere;
/*...*/
char data[0]; /* C90 style of "flexible array member" */
};
struct container_node *container_node_alloc(size_t data_size);
The allocation function allocates the node large enough so that data[0] through data[data_size-1] bytes of storage are available. Through another set of API functions, user data of arbitrary type be copied in and out.
The following approach is sometimes called "intrusive container". The container defines only a "base class" consisting of the link structure. The user must embed this structure into their own structure:
struct container_node {
struct container_node *next, *prev;
};
void container_insert(struct container *container, struct container_node *n);
struct container_node *container_first(struct container *container);
The user does this:
struct my_widget {
struct container_node container_links;
int widget_height;
/* ... */
};
/* .... */
/* We don't insert my_widget, but rather its links base. */
container_insert(&widg_container, &widget->container_links);
Some macros are used to convert between a pointer to the widget and a pointer to the container links. See the container_of macro used widely in the Linux kernel:
struct my_widget *wptr = container_of(container_first(&widg_container),
struct my_widget, container_links);
See this question.
Then there approaches of storing a union in each node, which provides an integer, floating-point-value or a pointer. In that case, the data is separately allocated (though not necessarily: if the caller controls the allocation of the nodes, it's still possible to put the node structure and the user data in a buffer that came from a single malloc call).
Finally, there are also approaches which wrap these techniques with preprocessor templating, an example of which are the BSD QUEUE macros.
This question already has answers here:
Simulation of templates in C (for a queue data type)
(10 answers)
Closed 6 years ago.
Is there any way to create generic data structure in C and use functions in accordance with the stored data type, a structure that has various types of data and for example can be printed according to the stored data.
For example,
Suppose I wish to make a binary search tree that has just float's, int's stored. The natural approach to do would be to create an enumeration with int's and float's. it would look something like this:
Typedef enum {INT, FLOAT} DataType;
Typedef struct node
{
void *data;
DataType t;
struct node *left,
*right;
}Node;
if i want print it out:
void printTree(Node *n)
{
if (n != NULL)
{
if (n->t == INT)
{
int *a = (int *) n->data;
printf("%d ", *a);
}
else
{
float *a = (float *) n->data;
printf("%f ", *a);
}
printTree(n->left);
printTree(n->right);
}
}
That's ok but i want to store another data type as a stack, query or something else. So that's why I created a tree that does not depends on a specific data type, such as:
Typedef struct node
{
void *data;
struct node *left,
*right;
}Node;
If i want to print it out i use callback functions, such as:
Node *printTree(Node *n, void (*print)(const void *))
{
if (n != NULL)
{
print(n->data);
printTree(a->left);
printTree(a->right);
}
}
But it falls down when i try to insert a integer and a float and print it out. My question is, Is there a way of creating a generic data structure that a routine depends on a specific data type in one situation but another situation it doesn't , for mixed data type? In this situation i should create a structure that stores int's and float's stores it and use a print function like in the first print code for that in the callback function?
observation: I just declared a node in the structure and did everything on it trying to simplify, but the idea is to use the structure with .h and .c and all this abstraction involving data structures.
I would suggest trying something like the following. You'll noticed that Node contains a tagged union that allows for either a pointer type, an integer, or a floating point number. When Node is a pointer type, the custom print function is called, and in the other cases, the appropriate printf format is used.
typedef enum {POINTER, INT, FLOAT} DataType;
typedef struct node
{
DataType t;
union {
void *pointer;
int integer;
float floating;
} data;
struct node *left,
*right;
} Node;
void printTree(Node *n, void (*print)(const void *))
{
if (n != NULL) {
switch (n->t) {
case POINTER:
print(n->data.pointer);
break;
case INT:
printf("%d ", n->data.integer);
break;
case FLOAT:
printf("%f ", n->data.floating);
break;
}
printTree(a->left, print);
printTree(a->right, print);
}
}
C doesn't support this kind of generic data types/structures. You have a few options you can go with:
If you have the opportunity to use Clang as the compiler, there's a language extension to overload functions in C. But you have to cast the argument to the specific type, so the compiler knows which function to call.
Use C++
although you still have to cast the argument, so the compiler knows which of the available functions called print he has to call.
use templates
Create a function called print which takes something like
struct data_info {
void *data;
enum_describing_type type;
}
print does a switch and calls the appropriate printInt, printFloat etc.
uthash is a collection of header files that provide typed hash table, linked list, etc. implementations, all using C preprocessor macros.
Can structures contain functions?
No, but they can contain function pointers.
If your intent is to do some form of polymorphism in C then yes, it can be done:
typedef struct {
int (*open)(void *self, char *fspec);
int (*close)(void *self);
int (*read)(void *self, void *buff, size_t max_sz, size_t *p_act_sz);
int (*write)(void *self, void *buff, size_t max_sz, size_t *p_act_sz);
// And data goes here.
} tCommClass;
The typedef above was for a structure I created for a general purpose communications library. In order to initialise the variable, you would:
tCommClass *makeCommTcp (void) {
tCommClass *comm = malloc (sizeof (tCommClass));
if (comm != NULL) {
comm->open = &tcpOpen;
comm->close = &tcpOpen;
comm->read = &tcpOpen;
comm->write = &tcpWrite;
}
return comm;
}
tCommClass *makeCommSna (void) {
tCommClass *comm = malloc (sizeof (tCommClass));
if (comm != NULL) {
comm->open = &snaOpen;
comm->close = &snaOpen;
comm->read = &snaOpen;
comm->write = &snaWrite;
}
return comm;
}
tCommClass *commTcp = makeCommTcp();
tCommClass *commSna = makeCommSna();
Then, to call the functions, something like:
// Pass commTcp as first params so we have a self/this variable
// for accessing other functions and data area of object.
int stat = (commTcp->open)(commTcp, "bigiron.box.com:5000");
In this way, a single type could be used for TCP, SNA, RS232 or even carrier pidgeons, with exactly the same interface.
edit Cleared up ambiguity with the use of 'data types'
Not in C. struct types can only contain data.
From Section 6.7.2.1 of the ISO C99 Standard.
A structure or union shall not contain a member with incomplete or function type (hence,
a structure shall not contain an instance of itself, but may contain a pointer to an instance
of itself), except that the last member of a structure with more than one named member
may have incomplete array type; such a structure (and any union containing, possibly
recursively, a member that is such a structure) shall not be a member of a structure or an
element of an array.
No, you cannot. A structure cannot contain a declaration of a function but they can contain a definition of a function. A structure can only contain data types, pointers, pointers to different function. You can make a pointer to a function and then access from the structure.
#include<iostream>
#include<cstring>
using namespace std;
struct full_name
{
char *fname;
char *lname;
void (*show)(char *,char*);
};
void show(char *a1,char * a2)
{
cout<<a1<<"-"<<a2<<endl;
}
int main()
{
struct full_name loki;
loki.fname="Mohit";
loki.lname="Dabas";
loki.show=show;
loki.show(loki.fname,loki.lname);
return 0;
}
In C, structures are allowed to contain on data values and not the function pointers. Not allowed in C. but the following works literally fine when checked with gcc.
enter code here
#include <stdio.h>
struct st_func_ptr{
int data;
int (*callback) ();
};
int cb(){
printf(" Inside the call back \n");
return 0;
}
int main() {
struct st_func_ptr sfp = {10, cb};
printf("return value = %d \n",sfp.callback());
printf(" Inside main\n");
return 0;
}
So, am confused ...
It's all right.
In the linux kernel code,you will find many structures contain functions.
such as:
/*
* The type of device, "struct device" is embedded in. A class
* or bus can contain devices of different types
* like "partitions" and "disks", "mouse" and "event".
* This identifies the device type and carries type-specific
* information, equivalent to the kobj_type of a kobject.
* If "name" is specified, the uevent will contain it in
* the DEVTYPE variable.
*/
struct device_type {
const char *name;
struct attribute_group **groups;
int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
void (*release)(struct device *dev);
int (*suspend)(struct device * dev, pm_message_t state);
int (*resume)(struct device * dev);
};
Yes its possible to declare a function and the function definition is not allowed and that should be the function pointer.
Its based on C99 tagged structure.
Lokesh V
They can, but there is no inherent advantage in usual C programming.
In C, all functions are in the global space anyway, so you get no information hiding by tucking them in a function. paxdiablo 's example is a way to organize functions into a struct, but you must see has to dereference each one anyway to use it.
The standard organizational structure of C is the File, with
the interfaces in the header and the implementations in the source.
That is how libc is done and that is how almost all C libraries are done.
Moder C compilers allow you to define and implement functions in the same source file, and even implement static functions in header files. This unfortunately leads to some confusion as to what goes where, and you can get unusual solutions like cramming functions into structs, source-only programs with no headers, etc.
You lose the advantage of separating interface from implementation that way.