Coding Style -- Pass by Reference or Pass by Value? - c

In order to simplify the development of future school assignments I decided to create an API (is that what you would call it?) for two data structures I commonly use -- a linked list and a hash table.
In developing each of these I ended up with the following two insert functions:
int list_insert(list *l, char *data, unsigned int idx);
int hash_insert(hash_table **ht, char *data);
The list_insert() function (and all of the list functions) ended up being pass-by-value since I never had any need to directly modify the list * itself unless I was malloc'ing or free'ing it. However, because I wanted to include auto-rehashing in my hash table I found that I had to pass the table by-reference instead of by-value in any function that might force a rehash. Now I end up with syntax like the following:
list_insert(l, "foo", 3);
hash_insert(&ht, "foo");
The difference strikes me as a little odd and I found myself wondering if I should change the list functions to be pass-by-reference as well for consistency's sake -- even though none of my functions would need to leverage it. What's the typical consensus here? Should I only pass-by-reference if my function actually needs to modify its arguments or should I pass-by-reference for the sake of consistency?
Structure definitions:
typedef struct list_node list_node;
struct list_node {
char *data;
list_node *next;
list_node *prev;
};
typedef struct list list;
struct list {
list_node *head;
list_node *tail;
size_t size;
};
typedef struct hash_table hash_table;
struct hash_table {
list **table;
size_t entries;
size_t buckets;
float maxLoad;
unsigned int (*hash)(char*, unsigned int);
};
List functions:
list *list_createList();
list_node *list_createNode();
void list_destroyList(list *l);
void list_destroyNode(list_node *n);
int list_append(list *l, char *data);
int list_insert(list *l, char *data, unsigned int idx);
int list_remove(list *l, char *data, int (*compar)(const void*, const void*));
void list_push(list *l, char *data);
char *list_pop(list *l);
int list_count(list *l, char *data, int (*compar)(const void*, const void*));
int list_reverse(list *l);
int list_sort(list *l, int (*compar)(const void*, const void*));
int list_print(list *l, void (*print)(char *data));
Hash functions:
hash_table *hash_createTable(size_t buckets, float maxLoad, unsigned int (*hash)(char*, unsigned int));
void hash_destroyTable(hash_table *ht);
list *hash_list(const hash_table **ht);
int hash_checkLoad(hash_table **ht);
int hash_rehash(hash_table **ht);
int hash_insert(hash_table **ht, char *data);
void hash_stats(hash_table *ht);
int hash_print(hash_table *ht, void (*print)(char*));

Here is a general rule of thumb:
pass by value if its typdef is a native type (char, short, int, long, long long, double or float)
pass by reference if it is a union, struct or array
Additional considerations for passing by reference:
use const if it will not be modified
use restrict if pointers will not point to the same address
Sometimes a struct/union seems like the appropriate type, but can be replaced with arrays if the types are similar. This can help with optimization (loop vectorization for example)

That's up to you and takes a little intuition. When passing large structs I pass by reference so that I am not eating up extra stack space and burning cycles copying the struct. But with small struts like yours it may be more efficient to use the stack depending on your target processor, how often you are using the values, and what your compiler does. Your compiler may break that struct up and put its values into registers.
But if you do pass by reference and do not intend to modify the value it is best practice to pass a pointer to const, eg: const list * l. That way there isn't any risk of you accidentally modifying the value and it makes the interface cleaner- now the caller knows that the value won't be changing.
Consistency is nice and I personally would lean in that direction especially on large interface because it may make things easier in the long run, but I would definitely use const. In doing so you allow the compiler to discover any accidental assignments so that later you don't need to track down a hard to bug.
See also: Passing a struct to a function in C

Related

Casting function pointers

I am writing a function that receives a pointer to a comparison function and an array of MyStructs and is supposed to sort the array according to the comparison function:
void myStructSort(
struct MyStruct *arr,
int size,
int (*comp)(const struct MyStruct *, const struct MyStruct *)) {
qsort(arr, size, sizeof(struct MyStruct), comp);
}
Unfortunately this doesn't compile because qsort expects the comparator to receive void * arguments and not const struct MyStruct *. I thought of several bad solutions and was wondering what the correct solution is.
Option 1
Cast comp to int (*)(const void *, const void*). This compiles but is undefined behavior (see this SO question).
Option 2
Create a global variable int (*global_comp)(const struct MyStruct *, const struct MyStruct *) and set global_comp=comp inside myStructSort. Then create a function:
int delegatingComp(const void *a, const void *b) {
return globalComp((const struct MyStruct *)a, (const struct MyStruct *)b);
}
And in myStructSort call qsort(arr, size, sizeof(struct MyStruct), delegatingComp). The problem with this is the icky global variable.
Option 3
Reimplement qsort. This is functionally safe but very bad practice.
Is there a magical perfect fourth option?
Edit
I can't change the API of myStructSort and I am compiling my code using gcc c99 -Wall -Wextra -Wvla.
Option 2 breaks thread-safety, so I wouldn't choose that one.
Option 3 is just plain wrong as you point out. There is no reason to re-implement quicksort and potentially make a mistake.
Option 1 is UB but it will work on any sane compiler. If you choose this option be sure to add a comment.
I would also consider:
Option 4. Redesign the interface of myStructSort to take int (*)(const void *, const void*) or scrap it entirely and call qsort directly. Basically send it back to the architecht, because he made a poor design choice.
following approach only works for gcc. It's a part of gnu extension. further please reference to https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Nested-Functions.html#Nested-Functions
first let's make sure the prototype of qsort is in such a form:
void qsort(void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *));
then you can:
void myStructSort(
struct MyStruct *arr,
int size,
int (*comp)(const struct MyStruct *, const struct MyStruct *)) {
int comparator(const void * a, const void *b) {
return comp((const struct MyStruct *)a, (const struct MyStruct *)b);
}
qsort(arr, size, sizeof *arr, comparator);
}
But again, since it uses gnu extension, don't expect too much portability.
ABOUT YOUR COMMENT: for modern gcc, gnu standard is default instead of iso ones. specifically, lastest gcc should use gnu11 standard. older ones are using gnu89. so, I don't know about your command line params, but if -std is not set, this will work.
following is an example taken from info gcc, just in case the link is dead. it shows a closure-like usage of nested function:
bar (int *array, int offset, int size)
{
int access (int *array, int index)
{ return array[index + offset]; }
int i;
/* ... */
for (i = 0; i < size; i++)
/* ... */ access (array, i) /* ... */
}
If you are using gcc, then you can use the qsort_r function in glibc since 2.8, which allows you to specify a comparator function with an additional user-supplied argument:
void qsort_r(void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *, void *),
void *arg);
This is not portable, of course, and it requires you to define the feature-test macro:
#define _GNU_SOURCE
(On FreeBSD -- and, presumably, Mac OS X -- there is a similar but incompatible qsort_r; the difference is that the user-supplied context argument is provided as the first argument to the comparison function, rather than the last argument.)
But if you have it, it allows you to avoid the global in option 2:
/* This struct avoids the issue of casting a function pointer to
* a void*, which is not guaranteed to work. It might not be
* necessary, but I know of no guarantees.
*/
typedef struct CompContainer {
int (*comp_func)(const struct MyStruct *, const struct MyStruct *);
} CompContainer;
int delegatingComp(const void *a, const void *b, void* comp) {
return ((CompContainer*)comp)->comp_func((const struct MyStruct *)a,
(const struct MyStruct *)b);
}
void myStructSort(
struct MyStruct *arr,
int size,
int (*comp_func)(const struct MyStruct *,
const struct MyStruct *)) {
const CompContainer comp = {comp_func};
qsort_r(arr, size, sizeof(struct MyStruct), delegatingComp, &comp);
}
(Live on ideone)
The correct approach is to cast from void const * to MyStruct const * in the comparison function.
This is well-defined for the first object, because the pointer that was passed to the comparison function was created by a cast from MyStruct const * to void const *, and casting a pointer to void back to its original type is allowed (and it's really the only thing that is).
For the other array members, it is assumed that casting void const * to char const *, adding the offset of the object, generated by multiplying the object size with the position of the object in the array, and casting that back to void const * will give a pointer that can be cast back to MyStruct const *.
That is a bold assumption, but usually works out. There may be corner cases where this doesn't work, but in general compilers pad any struct foo to a multiple of its alignment to ensure that array members' start addresses have a distance of sizeof(struct foo).
Casting function pointers is generally unsafe and needs to be avoided, as different data types may have different representations -- for example, a void * must be able to express every possible address as it could have been converted from a char *, while a MyStruct * is guaranteed to have a few of the least significant bits clear as any valid object would be aligned -- so it is entirely possible that the calling convention for these types could be different.
The only sane option is to re-write the interface you've created, or make a new one.
I've done something very similar with bubble sort on another answer of mine.
In short, with C, you want your sort function to be of the form:
void* bubbleSort(void* arr, int (*compareFcn)(void*, void*),
size_t sizeOfElement, size_t numElements)
And your comparison function to be of the form:
int compareFunction(void *a, void *b);

Is there a way to pass a function pointer with generic arguments?

I am implementing a generic singly linked list where list nodes store a pointer to their data.
typedef struct sll_node
{
void *data;
struct sll_node *next;
} sll_node;
To implement a generic find subroutine that works with any kind of data, I wrote it so that it takes as an argument a function pointer to the comparison function as follows:
/* eq() must take 2 arguments. ex: strcmp(char *, char *) */
sll_node *sll_find(void *data, int (*eq)(), sll_node *root);
You can pass the appropriate function pointer that works with the data type at hand.. So if you store strings in the list nodes, you can pass strcmp as the eq() function, and so on. It works but I'm still not satisfied..
Is there a way to explicitly specify the number of comparison function parameters without giving up its generality?
I tried this at first:
sll_node *sll_find(void *data, int (*eq)(void *, void *), sll_node *root);
I expected it to work. But no (edit: it compiles with a warning but I have -Werror on!), I had to write a wrapper function around strcmp to make it conform to the eq prototype.
I then tried:
sll_node *sll_find(void *data, int (*eq)(a, b), sll_node *root);
or:
typedef int (*equality_fn)(a, b);
sll_node *sll_find(void *data, equality_fn eq, sll_node *root);
which both wouldn't compile since: "a parameter list without types is only allowed in a function definition"
To use strcmp without a wrapper or a cast, the declaration needs to be
sll_node *findNode(void *data, int (*eq)(const char *, const char *), sll_node *root);
On the other hand, if you declare the args as const void *, then you can avoid the wrapper by casting strcmp to the appropriate type.
Method 1: direct cast, messy but effective
result = findNode( "hello", (int(*)(const void *, const void *))strcmp, root );
Method 2: typedef the comparison function, and then use it to cast
typedef int (*cmpfunc)(const void *, const void *);
result = findNode( "world", (cmpfunc)strcmp, root );
Edit: After reading this post that #WilburVandrsmith linked, I've decided to leave this answer as is. I leave it up to the reader to decide whether the proposed cast violates the following paragraph from the specification:
If a converted pointer is used to call a function whose type is not
compatible with the pointed-to type, the behavior is undefined.
Compatible or not compatible, that is the question, you decide.
Your last attempted solution is the closest to being correct. The parameters in your defined-type function pointer need to be declared with their data types, just like you would with a regular function declaration, like so:
typedef int (*equality_fn)(char *a, char *b);
sll_node *sll_find(void *data, equality_fn eq, sll_node *root);
UPDATE
To make it more generic use void pointers, and then type cast the passed void pointers to the needed data type in the matching function definition for equality_fn:
typedef int (*equality_fn)(void *a, void *b);
sll_node *sll_find(void *data, equality_fn eq, sll_node *root);
Something else important to remember is that a pointer is a pointer is a pointer, regardless of what it's pointing at or how it was originally defined. So, you can have some function pointer, or a void pointer, or a pointer to a byte, a char, an int--anything--as long as you handle it properly in your code and cast it back to a valid type before attempting to use it.
Something else that most coders don't take much advantage of in C is that function names themselves are really just addresses that are called at run-time, and so they are also pointers. ;)
My solution to this conundrum would be (avoiding pointer typedefs, incidentally):
typedef int equality_fn(const void *a, const void *b);
sll_node *sll_find(void *data, equality_fn *eq, sll_node *root);
Then make all your comparators be of type equality_fn. If you need to actually have a function then so be it:
equality_fn eq_strcmp; // a prototype
// ...
int eq_strcmp(const void *a, const void *b) { return strcmp(a, b); }
Gain lots of type safety in exchange for a potential picosocopic runtime penalty - which end of this trade you want to be on depends on your application.

function pointers for objects in C

typedef struct node{
int term;
struct node *next;
}node;
typedef void(*PTR )(void *);
typedef void(*PTR1)(void *,int,int);
typedef int(*PTR2)(void *,int);
typedef void(*PTR3)(void *,int);
typedef void(*PTR4)(void *,void *,void *);
typedef struct list{
node *front,*rear;
PTR3 INSERT;
PTR *MANY;
PTR DISPLAY,SORT,READ;
PTR4 MERGE;
}list;
void constructor(list **S)
{
(*S)=calloc(1,sizeof(list));
(*S)->front=(*S)->rear=NULL;
(*S)->INSERT=push_with_value;
(*S)->READ=read;
(*S)->SORT=sort;
(*S)->DISPLAY=display;
(*S)->MERGE=merger;
(*S)->MANY=calloc(2,sizeof(PTR));
(*S)->MANY[1]=read;
}
int main()
{
list *S1,*S2,*S3;
constructor(&S1);
constructor(&S2);
constructor(&S3);
S1->MANY[1](S1);
S1->SORT(S1);
S1->DISPLAY(S1);
return 0;
}
The void * parameter in all such functions gets typecast to list * inside the function.
Is there any way through which I can call S1->READIT; by changing the MANY[1] to another name like READ_IT;?
I intend to create a common header file, so that I can use it for all my programs.
Since I don't know how many function pointers I will need I intend to create a dynamic array of each function pointer type.
typedef struct list{
node *front,*rear;
PTR3 INSERT;
PTR READIT;
PTR DISPLAY,SORT,READ;
PTR4 MERGE;
}list;
...
(*S)->READIT = read;
...
S1->READIT(S1);
Take a look at the Linux kernel implementation of (doubly linked) lists, as defined here (and following/referenced files). They are used all over the place. Most of the manipulation is done in macros to e.g. run an operation on all nodes of the list.
If what you are trying to define is getting too complicated, step back and look for simpler alternatives. Don't generalize beforehand; if the generalization isn't used it is a waste; if something (slightly) different is later needed, it is a poor match that requires workarounds or even reimplementation.
Take a look at the interfaces exposed by the C++ STL list, those folks have thought long and hard on the matter (in a different setting, though).
Or just bite the bullet and use C++ if you want full-fledged OOP.

C Generic ADT with function pointers

I'm writing a generic list adt and this is what I have in the header so far. From what I know this is usually how it's done.
typedef struct _node {
void *data;
struct _node *next;
} Node;
typedef struct {
Node *dummy;
int (*comparePtr) (void *d1, void *d2);
void (*destroyPtr) (void *data);
} List;
List *ListCreate (int (*comparePtr) (void *d1, void *d2), void (*destroyPtr) (void *data));
void ListDestroy (List *node);
void ListAddToTail (List *list, void *data);
int ListContains (List *list, void *data);
void *ListGetFromIndex (List *list, int index);
It works fine on the implementation side. What I noticed is that in order to use this adt to store integers I have to make calls in this fashion
int a = 5;
ListAddToTail (list, &a);
whereas in a perfect world I'd be able to do this
ListAddToTail (list, 55);
So the question is is it possible to modify this to allow me to pass in any type of data, pointer or non-pointer, non-pointer being mainly primitive types like integers and characters?
There's no clean, completely nice way to solve this. You have a few options:
On most platforms you can simply get away with stuffing an integer in a void *. It's messy but it works pretty well, especially if you silence the warnings
Define your own boxing functions / macros that allocate the required space and give you back a pointer. You can probably make a really nice macro using typeof tricks. But then you have to remember to free that space
The main issue should be uniformity. Your list lets people store pointers. You should let them deal with questions like "how do I get a pointer to my data".
EDIT
I just made a primitive "box" macro:
#define box(value) \
({ \
typeof(value) *ptr = malloc(sizeof *ptr); \
*ptr = value; \
ptr; \
})

Iterator in C language

Has anyone tried providing support for Iterator in C.
I am not looking for exact C++ STL::Iterator but minimal support for some idea to start would be good point for me .
I am developing container library same like stl but with minimal support, So I need this kind of functionality in those container.
I am looking forward of defining certain sets of algorithms interfaces ( similar to STL ). For example sort , which will take begin and end iterator and should work with any container.
Pointers can serve this function. container.begin() is easy, and container.end() doesn't take too much work.
Consider
Value array[N];
typedef Value* iterator;
iterator array_begin(Value a[]){ return &a[0];}
iterator array_end(Value a[], int n){ return &a[n];}
iterator array_next(iterator i) { return ++i;}
iterator it = array_begin(a);
iterator end = array_end(a,N);
for (;it < end; it=array_next(it))
{
Value v = *it;
}
For other containers like lists, you can use NULL as end. Same for trees, but the next function needs to maintain state. (or the iterator is a pointer to a struct with state updated by calls to next(it)).
Take a look at linked lists. A node includes a "next" pointer that one can use to iterate through the list, in a manner analogous to C++ iterators:
typedef struct Node {
...
struct Node *next;
} Node;
...
Node *iter, *firstNode, *nodeList;
/* set firstNode and populate nodeList */
for (iter = firstNode; iter != NULL; iter = iter->next) {
/* iterate through list */
}
It's not a C++ iterator, but hopefully this gives an idea of one way to approach this in C.
If you are allowed to use LGPL code in your project have a look at GLib instead of re-inventing the wheel. GLib allows also to develop in a quite portable way at source code level.
Have a look at g_list_first() and g_list_next() which implement the functionality of an iterator on the list. There is even a g_list_foreach()`
http://library.gnome.org/devel/glib/stable/glib-Doubly-Linked-Lists.html
You'd need a standardized way of incrementing the iterator. In C++, that's just the overloaded operator++(). Your container needs an associated function that returns a pointer to the next element. This incrementing function would need to be passed as a pointer to any generalized routine that can accept an iterator in your library.
For example, If I want to write a function that returns the max element from the container, I need not only the comparison function (the equivalent of operator<()), I need an iterator-incrementing function (the equivalent of operator++()).
So ensuring that I can accept a pointer to your incrementing function is the key requirement.
This is what I came up with:
typedef struct PWDict PWDict;
typedef struct PWDictIterator PWDictIterator;
typedef struct PWDictImplementation
{
PWDict *(*create)(const struct PWDictImplementation *impl, size_t elements);
void (*destroy)(PWDict *dict);
unsigned int (*size)(const PWDict *dict);
unsigned int (*sizeInBytes)(const PWDict *dict);
int (*get)(const PWDict *dict, const char *key, char *output, size_t size);
int (*set)(PWDict *dict, const char *key, const char *value);
PWDictIterator *(*iteratorCreate)(const PWDict *dict);
void (*iteratorBegin)(PWDictIterator *it);
void (*iteratorEnd)(PWDictIterator *it);
void (*iteratorDestroy)(PWDictIterator *it);
const char *(*iteratorGetKey)(const PWDictIterator *it);
const char *(*iteratorGetValue)(const PWDictIterator *it);
int (*iteratorSetValue)(PWDictIterator *it, const char *value);
void (*iteratorNext)(PWDictIterator *it);
}
PWDictImplementation;
struct PWDict
{
PWDictImplementation *impl;
};
struct PWDictIterator
{
PWDict *dict; /* get iterator implementation from the dict implementation */
};
PW is our project prefix.
We just needed a dictionary (string-string map) like container.
I found one open source project which is STL implementation in C language.
http://sourceforge.net/projects/tstl2cl/

Resources