How is generic list manipulation function written? - c

I am a beginner in programming, please go easy on me and I am finding difficult to get the answer for my question. I can't get my head around the complex codes. Can some one please explain me with simple coding of how is generic list manipulation function written which accepts elements of any kind? Thanks in advance.

This is normally done using void pointers:
typedef struct node {
struct node *next;
void *data;
} node;
node *insert(node *list, void *data) {
}
node *delete(node *list, node *to_delete) {
}
such manipulation functions do not depend on the actual type of data so they can be implemented generically. For example you can have a data type struct for the data field above:
typedef struct data {
int type;
void *data;
} data;
/* .... */
data d;
d.type = INT;
d.data = malloc(sizeof(int));
node n = {NULL, (void*)&data);

It looks like you need a heterogenous list. Some pointers below:
Make the data element of the list node as a generic structure, which contains an indicator for data type and data.
/** This should be your data node **/
struct nodedata
{
int datatype;
void *data;
};
/** This should be your list node **/
struct listnode
{
struct nodedata *data;
struct listnode *next;
};
Using the above structure, you can store different types of data.
Use function pointers for comparison functions or invoke different functions depending upon the data type.

Related

How do you write generic list without knowing the implementation of structure?

Let's assume there is an employee ADT, such as
//employee.h
typedef struct employee_t employee_t;
employee_t* employee_create(char* company, char* department, char* position);
void employee_free(employee_t* me);
, and client code would be
#include "employee.h"
employee_t* Kevin = employee_create("Facebook", "Marketing", "Sales");
employee_t* John = employee_create("Microsoft", "R&D", "Engineer");
Now client wanted to use list ADT to insert Kevin and John to list for some task.
//list.h
typedef struct list_t list_t;
list_t* list_create(/*might have some arguments*/);
So client code would then be
#include "employee.h"
#include "list.h"
employee_t* Kevin = employee_create("Facebook", "Marketing", "Sales");
employee_t* John = employee_create("Microsoft", "R&D", "Engineer");
list_t* employee = list_create(/*might have some arguments*/);
list_insert(employee, Kevin);
list_insert(employee, John);
employee_free(Kevin);
employee_free(John);
list_print(employee); //Oops! How to print structure that you can't see?
Because employee is encapsulated by opaque pointer, there is no way for list to copy it.
How to write ADT and implementation for list?
The usual way to do this is to have your list structure store the data as a void*. For example, assmuming your list is a singly linked list:
struct list_t
{
void *data;
struct list_t *next;
};
Now list_insert whould be something like this:
list_t *list_insert(list_t *head, void *data)
{
list_t *newHead = (list_t*)malloc(sizeof(list_t));
newHead->data;
newHead->next = head;
return newHead;
}
If you want to hide away the implementation of the struct then you can add methods to extract the data. For example:
void *list_get_data(list_t *head)
{
return head->data;
}
How do you write generic list without knowing the implementation of structure?
Create functions that handle the structure abstractly.
How to write ADT and implementation for list?
list_create(); needs to pass in helper function pointers for the particular object type to perform various tasks abstractly.
A copy function like void *employee_copy(const void *emp) so list_insert(employee, Kevin); knows how to copy Kevin.
A free function like void employee_free(void *emp) so list_uninsert(employee_t) can free the list when destroyed or members removed one-by-one.
A print function int employee_print(void *emp) so list_print(employee_t) knows how to print each member of its list.
Possibly others.
Rather than pass in 3+ function pointers, consider passing in a struct that contains these pointers, then the list only needs the overhead of 1 pointer: list_create(employee_t_function_list)
You are taking your first steps toward re-writing C++
You can use something called intrusive list. This concept is heavily used in Linux kernel.
All you need is to embed the node into the struct and let the generic code operate only on this struct member.
#include <stddef.h>
struct list_node {
struct list_node *next;
};
struct list_head {
struct list_node *first;
};
/* translates pointer to a node to pointer to containing structure
* for each pointer `ptr` to a `struct S` that contain `struct list_node node` member:
* list_entry(&ptr->node, S, node) == ptr
*/
#define list_entry(ptr, type, member) \
(type*)((char*)ptr - offsetof(type, member))
void list_insert(struct list_head *head, struct list_node *node) {
node->next = head->first;
head->first = node;
}
#define LIST_FOREACH(it, head) \
for (struct list_node *it = (head)->first; it; it = it->next)
The interface can be easily extended by other helpers like list_is_empty, list_first, list_remove_first, embed size to struct list_head.
Exemplary usage:
typedef struct {
char *name;
struct list_node node;
} employee_t;
typedef struct {
char *name;
struct list_head employees;
} employer_t;
employer_t company = { .name = "The Company" };
employee_t bob = { .name = "Bob" };
employee_t mark = { .name = "Mark" };
list_insert(&company.employees, &bob.node);
list_insert(&company.employees, &mark.node);
printf("Employees of %s:\n", company.name);
LIST_FOREACH(n, &company.employees) {
employee_t *e = list_entry(n, employee_t, node);
printf("%s\n", e->name);
}
Prints:
Employees of The Company:
Mark
Bob
Note that the list_* interface can easily used for other types as well.
See article for more information about using this concept for double-linked list.
Edit
Note that list_entry invokes a subtle Undefined Behavior.
It is related to performing pointer arithmetics outside of the struct member object but still within a parent object.
Note that any objects can be treated as an array of chars.
This code will work on all major compilers and it very unlikely to ever fail because it would break a lot of existing and heavily used code (like Linux kernel or Git).
This program is strictly conforming if struct node is a first member of the embedding struct because C standard allows safe conversion between any structure and its first member.
To be strictly conforming if node is not a first member,
The issue could be circumvented by forming a pointer to struct list_node not as &bob.node but rather using a pointer arithmetics on a pointer to bob. The result would be:
(struct list_node*)((char*)&bob + offsetof(employee_t, node))
However, this syntax is really nasty, so personally I would go for &bob.node.

Best practice for generic data structure implementation in C

In my adventures implementing generic data structures in C, I've come across a dilemma. For example, in the following code:
void add_something(avl_tree_t * my_tree) {
int new_element = 123;
avl_insert(my_tree, (void*)&new_element);
}
int main() {
avl_tree_t * my_tree = avl_create();
add_something(my_tree);
// do stuff
avl_print(my_tree, function_that_prints_ints);
exit(0);
}
In which avl_insert is defined as
void avl_insert(avl_tree_t * tree, void * data) {
avl_node_t * new_node = malloc(sizeof(struct avl_node));
new_node->data = data;
// do tree balancing stuff
}
In order for my generic insertion function to work, I have to pass it a void * item to store. However, in order for that to work, in this case I need to pass in the address of the new int item I'm adding so that I can then dereference it to a void *. If I am not mistaken, when we're back in the main function, the memory address in which I stored my new element will be compromised.
One way I looked into to solve this issue is to pass in the size of the things I am storing in the tree as a parameter for avl_create, and then allocating memory for a copy of each element I insert. This works because you don't need the original address or value for whatever you added.
Another thing that works is only using the data structure in the span of a single function, which is obviously not viable.
My question is this: what is the best way to go about storing statically allocated data in a generic data structure, be it basic C types or user made structures?
Thank you in advance.
To store pointers to data with automatic storage duration, yes, you would have to know the size of the elements in the container and allocate and copy the pointed-to data.
The simplest way is to just allocate and copy in all cases, optionally using a user-specified clone() or create() function to make deep copies, if necessary. This also entails the use of a user-specified destroy() function to dispose of the copies properly (again, if necessary).
To be able to avoid the allocation, then you have to have some kind of state variable that lets you know if the container should allocate, or just copy the pointer value itself.
Note that this should apply to the container object, not to the individual nodes or elements. If a container stores data in one way or the other, it should store all data that way. See Principle of Least Astonishment.
This is the more complex approach, since you have to be sure to use the correct process for adding and deleting elements based on the state variable. It's ususally much simpler to just make sure you never pass in a pointer to a value with automatic storage duration.
Use a mix-in style; e.g. do not make data part of the node but the node part of the data:
struct avl_node {
struct avl_node *parent;
struct avl_node *left;
struct avl_node *right;
};
struct person {
char const *name;
struct avl_node node;
};
struct animal {
struct avl_node node;
int dangerousness;
};
Constructors for animal are like
struct animal *animal_create(double d)
{
struct animal *animal = malloc(sizeof *animal);
*animal = (struct animal) {
.node = AVL_NODE_INIT(),
.dangerousness = d,
};
return animal;
}
The generic AVL tree operations could look like
void avl_tree_insert(struct avl_node **root, struct avl_node *node,
int (*cmp)(struct avl_node const *a, struct avl_node const *b))
{
/* .... */
}
and a cmp function for animal like
int animal_cmp(struct avl_node const *a_, struct avl_node const *b_)
{
struct animal const *a = container_of(a_, struct animal, node);
struct animal const *b = container_of(b_, struct animal, node);
return a->dangerousness - b->dangerousness;
}

Polymorphic data structures in C

I am a C beginner with quite a lot of OOP experience (C#) and I am having trouble understanding how some notion of "polymorphism" can be achieved in C.
Right now, I am thinking how to capture the logical structure of a file system using structs. I have a folder that contains both folders and files. Folders in this folder can contain another files and folders, etc.
My approach:
typedef enum { file, folder } node_type;
struct node;
typedef struct {
node_type type;
char *name;
struct node *next;
struct node *children;
} node;
Is this the best I can do? I have found a lot of posts on "polymorphism in C", but I would like to see how a polymorphic data structure like this can be built cleanly and efficiently (in terms of memory wasted on unused members of those structures).
Thanks.
I hope I understand what you want - I'm unsure but I guess you want to do something like that:
typedef struct
{
int type; // file or folder?
} Item;
typedef struct
{
struct A;
// data related to a file
} File;
typedef struct
{
struct A;
// data related to a folder - like pointer to list of Item
} Folder;
As long as both structure follow the same memory mapping (same variables) and adds to it as a child, you'll be able to use the pointer properly in both structs.
Check this one out as well: How can I simulate OO-style polymorphism in C?
Edit: I'm not sure about the syntax above (took it from the link above). I'm used to writing it this way instead:
typedef struct
{
int type;
// data for file
} File;
typedef struct
{
int type;
// data for folder - list, etc
} Folder;
C has no intrinsic notion of polymorphism.
You will end up implementing the mechanisms that you want from scratch. That's not a bad thing. It gives you a lot more flexibility. For example, C++ virtual methods are hard-wired per class, you can't change method pointers per-instance.
Here are a few ideas:
Your node_type field provides a way to do a runtime type query. Going further, you can pack multiple types into one struct using a discriminated (or tagged) union: http://en.wikipedia.org/wiki/Tagged_union. I'm not sure whether a variant type qualifies as OO though.
Polymorphism is usually about behavior. You could store function pointers ("methods") in the struct, with pointers to different functions providing different behavior for different object instances. The C++ way of doing things is for each class to have a table of function pointers, then each object instance references the table for its class (incidentally the table pointers can also play the role of your node_type for RTTI). This is called a virtual method table.
Data inheritance means that subclasses contain all of the base class' data members plus some extra stuff. In C the easiest way to do this is by embedding the base class struct at the head of the derived class struct. That way a pointer to derived is a pointer to base.
typedef struct BaseClass {
int baseMember;
} BaseClass;
typedef struct DerivedClass {
BaseClass base;
int derivedMember;
} DerivedClass;
You could do worse than read "Inside the C++ Object Model" by Stanley B. Lippman. For example, this will help if you want to get an idea of how to implement multiple inheritance.
Here's an illustration of old-school C polymorphism, based on ancient memories of X/Motif.
If you just want a discriminated union (or even just a typed structure with a child pointer that may be null), it's probably simpler in your case.
enum NodeType { TFile, TFolder };
struct Node {
enum NodeType type;
const char *name;
struct Node *next;
};
struct FileNode {
struct Node base_;
};
struct FolderNode {
struct Node base_;
struct Node *children;
/* assuming children are linked with their next pointers ... */
};
Here are the constructors - I'll leave populating the linked lists as an exercise for the reader ...
struct Node* create_file(const char *name) {
struct FileNode *file = malloc(sizeof(*file));
file->base_.type = TFile;
file->base_.name = name; /* strdup? */
file->base_.next = NULL;
return &file->base_;
}
struct Node* create_folder(const char *name) {
struct FolderNode *folder = malloc(sizeof(*folder));
folder->base_.type = TFolder;
folder->base_.name = name;
folder->base_.next = NULL;
folder->children = NULL;
return &folder->base_;
}
Now we can walk a hierarchy, checking the type of each node and responding appropriately. This relies on the first member subobject having zero offset to the parent - if that doesn't hold (or you need multiple inheritance), you have to use offsetof to convert between base and "derived" types.
void walk(struct Node *root,
void (*on_file)(struct FileNode *),
void (*on_folder)(struct FolderNode *))
{
struct Node *cur = root;
struct FileNode *file;
struct FolderNode *folder;
for (; cur != NULL; cur = cur->next) {
switch (cur->type) {
case TFile:
file = (struct FileNode *)cur;
on_file(file);
break;
case TFolder:
folder = (struct FolderNode *)cur;
on_folder(folder);
walk(folder->children, on_file, on_folder);
break;
}
}
}
Note that we have a sort-of-polymorphic base type, but instead of switching on the type enumeration we could have a more completely polymorphic setup with virtual functions. Just add a function pointer to Node, something like:
void (*visit)(struct Node *self,
void (*on_file)(struct FileNode *),
void (*on_folder)(struct FolderNode *));
and have create_file and create_folder set it to an appropriate function (say, visit_file or visit_folder). Then, instead of switching on the enumerated type, walk would just call
cur->visit(cur, on_file, on_folder);

Reusing existing linked list API implementation

I have in the existing source base, linked list implementation(adding node, insertion, deletion , traversal) for the following structure:
typedef struct tagDirInfo
{
char *pdirName;
struct tagDirInfo *__next;
struct tagDirInfo *__prev;
}DIR_HEADER;
Lets assume that char* pdirName points to the data part
I want to form a wrap up for the data part and reuse the existing APIs and so that, the new linked list structure has the data part as:
typedef struct printJob
{
char labelName[BUF_LEN];
int priStatus;
time_t time_stamp;
}PRINTJOB;
I think if I do something like:
PRINTJOB newJob;
/* Fill in newJob structure */
DIR_HEADER *newNode;
newNode->pdirName = (char*)newJob;
newNode->__next = NULL;
newNode->__prev = NULL;
Doing so, will fill in the linked list structure.
But how can I access labelName data field through pdirName field of the linked list structure?
Do you mean you want do something like :
printf("labelName : %s\n", ((PRINTJOB *)(newNode->pdirName))->labelName);
However, your code have one mistake! To correct it:
Change
newNode->pdirName = (char*)newJob;
to
newNode->pdirName = (char*)&newJob;
You should use templates (if you could use c++).
char* Labelname = ((PRINTJOB*) newNode->pdirName)->labelName;
By the way, "newJob" should be of type PRINTJOB* not PRINTJOB.
A better solution would be following:
typedef struct _LINKED_LIST {
struct _LINKED_LIST *_Next;
struct _LINKED_LIST *_Prev;
} LINKED_LIST;
typedef struct {
LINKED_LIST List;
char labelName[BUF_LEN];
int priStatus;
time_t time_stamp;
} MY_LINKED_LIST_DATA;
MY_LINKED_LIST_DATA* MyData = (MY_LINKED_LIST_DATA*)
malloc(sizeof(MY_LINKED_LIST_DATA));
MyData->List->_Next = NULL;
MyData->List->_Prev = NULL;
Your data always contains linked list specific fields _Next and _Prev.

Useless variable name in C struct type definition

I'm implementing a linked list in C. Here's a struct that I made, which represents the linked list:
typedef struct llist {
struct lnode* head; /* Head pointer either points to a node with data or NULL */
struct lnode* tail; /* Tail pointer either points to a node with data or NULL */
unsigned int size; /* Size of the linked list */
} list;
Isn't the "llist" basically useless. When a client uses this library and makes a new linked list, he would have the following declaration:
list myList;
So typing llist just before the opening brace is practically useless, right? The following code basically does the same job:
typedef struct {
struct lnode* head; /* Head pointer either points to a node with data or NULL */
struct lnode* tail; /* Tail pointer either points to a node with data or NULL */
unsigned int size; /* Size of the linked list */
} list;
You need to give a struct a name if you will reference it in its declaration.
typedef struct snode {
struct snode* next;
struct snode* prev;
int id;
} node;
But if you won't reference the struct inside it you dont need to give it a name.
EDIT
Notice that typedef is and struct are two different statements in C.
struct is for creating complex types:
struct snode {
struct snode* next;
struct snode* prev;
int id;
};
Which reads like make a structure called snode that stores two references to itself (next and prev) and an int (id).
And typedef is for making type aliases:
typedef struct snode node;
Which reads like make a type alias for struct snode called node.
Yes, you are correct. It is only a matter of habit or convention to explicitly name the struct in addition to the typedef.
Note that there is little to no cost either way, since llist is not a variable and does not take up memory. It is like the difference between naming a variable i or index - the compiled form is the same, but one may be more readable than the other.
It's useless in that particular case but, if you wanted a pointer to that struct within the struct itself, it would be needed.
That's because the struct is known at the opening brace while the typedef isn't known until the final semicolon (simplistic, but good enough here).
So you would need it for something like:
typedef struct sNode { // structure can be used now
int payload;
struct sNode *next; // cannot use typedef yet
} tNode; // typedef can be used now
You could turn this around: not the structure tag, but the whole typedef is superfluous.
struct snode {
struct snode* next;
struct snode* prev;
int id;
};
Now you can declare a pointer with:
struct snode *ptr;
You can even declare an array of them:
struct snode mynodes[10];
You'll have to type the struct keyword, but that won't hurt the compiler or the human reader (look at that syntax highlighting!).
You could even declare a pointer to an unknown type (at this moment of compilation) using an incomplete type:
struct xnode *xptr=NULL;
That will come in handy when you want to create an API to some library, where the actually implementtation of the library is not known to the caller:
struct gizmo *open_gizmo(char *path, int flags);
int fiddle_with_gizmo(struct gizmo *ptr, int opcode, ...);
Et cetera. A typedef would force the header file to "broadcast" all its internals to the caller, even if that is not needed.

Resources