I am implementing a generic singly linked list where list nodes store a pointer to their data.
typedef struct sll_node
{
void *data;
struct sll_node *next;
} sll_node;
To implement a generic find subroutine that works with any kind of data, I wrote it so that it takes as an argument a function pointer to the comparison function as follows:
/* eq() must take 2 arguments. ex: strcmp(char *, char *) */
sll_node *sll_find(void *data, int (*eq)(), sll_node *root);
You can pass the appropriate function pointer that works with the data type at hand.. So if you store strings in the list nodes, you can pass strcmp as the eq() function, and so on. It works but I'm still not satisfied..
Is there a way to explicitly specify the number of comparison function parameters without giving up its generality?
I tried this at first:
sll_node *sll_find(void *data, int (*eq)(void *, void *), sll_node *root);
I expected it to work. But no (edit: it compiles with a warning but I have -Werror on!), I had to write a wrapper function around strcmp to make it conform to the eq prototype.
I then tried:
sll_node *sll_find(void *data, int (*eq)(a, b), sll_node *root);
or:
typedef int (*equality_fn)(a, b);
sll_node *sll_find(void *data, equality_fn eq, sll_node *root);
which both wouldn't compile since: "a parameter list without types is only allowed in a function definition"
To use strcmp without a wrapper or a cast, the declaration needs to be
sll_node *findNode(void *data, int (*eq)(const char *, const char *), sll_node *root);
On the other hand, if you declare the args as const void *, then you can avoid the wrapper by casting strcmp to the appropriate type.
Method 1: direct cast, messy but effective
result = findNode( "hello", (int(*)(const void *, const void *))strcmp, root );
Method 2: typedef the comparison function, and then use it to cast
typedef int (*cmpfunc)(const void *, const void *);
result = findNode( "world", (cmpfunc)strcmp, root );
Edit: After reading this post that #WilburVandrsmith linked, I've decided to leave this answer as is. I leave it up to the reader to decide whether the proposed cast violates the following paragraph from the specification:
If a converted pointer is used to call a function whose type is not
compatible with the pointed-to type, the behavior is undefined.
Compatible or not compatible, that is the question, you decide.
Your last attempted solution is the closest to being correct. The parameters in your defined-type function pointer need to be declared with their data types, just like you would with a regular function declaration, like so:
typedef int (*equality_fn)(char *a, char *b);
sll_node *sll_find(void *data, equality_fn eq, sll_node *root);
UPDATE
To make it more generic use void pointers, and then type cast the passed void pointers to the needed data type in the matching function definition for equality_fn:
typedef int (*equality_fn)(void *a, void *b);
sll_node *sll_find(void *data, equality_fn eq, sll_node *root);
Something else important to remember is that a pointer is a pointer is a pointer, regardless of what it's pointing at or how it was originally defined. So, you can have some function pointer, or a void pointer, or a pointer to a byte, a char, an int--anything--as long as you handle it properly in your code and cast it back to a valid type before attempting to use it.
Something else that most coders don't take much advantage of in C is that function names themselves are really just addresses that are called at run-time, and so they are also pointers. ;)
My solution to this conundrum would be (avoiding pointer typedefs, incidentally):
typedef int equality_fn(const void *a, const void *b);
sll_node *sll_find(void *data, equality_fn *eq, sll_node *root);
Then make all your comparators be of type equality_fn. If you need to actually have a function then so be it:
equality_fn eq_strcmp; // a prototype
// ...
int eq_strcmp(const void *a, const void *b) { return strcmp(a, b); }
Gain lots of type safety in exchange for a potential picosocopic runtime penalty - which end of this trade you want to be on depends on your application.
Related
I know that if I passed an argument like void (*func)(void *) to a variadic function, I can retrieve the argument like:
void (*func)(void *) = va_arg( args, void (*)(void) );
What if I pass something like void (** func)(void *)? What is the correct syntax to retrieve an argument of this type using va_arg?
Being frankly, your code is not standard-compliant. There is a tiny restriction for second argument of va_arg() macro:
... The parameter type shall be a type name specified such that the type
of a pointer to an object that has the specified type can be obtained
simply by postfixing a * to type. ...
According to this, notation like void (*)(void *) is unacceptable in this case. Since simple appending of * won't give you pointer to pointer to function. You may use only typedef-ed aliases:
typedef void (*func_ptr)(void *);
typedef void (**ptr_to_func_ptr)(void *);
func_ptr var1 = va_arg(ap, func_ptr);
ptr_to_func_ptr var2 = va_arg(ap, ptr_to_func_ptr);
Same as you've mentioned:
typedef void (** func_t)(void *);
func_t ppf;
va_list vl;
va_start(vl,n);
ppf = va_arg(vl, func_t);
...
To help pointer-to-function, I always use typedef as follows:
typedef void VoidFn(void *); // Function accepts void * and returns nothing.
That declares the prototype of the function as a typedef, allowing the use of the typedef elsewhere.
That way if you have this function:
void SomeFn(void *) {...}
you can declare this pointer:
VoidFn *fnPtr = &SomeFn; // A pointer to such a function.
This then makes it easier to change the prototype independently of the pointer, helping more... sophisticated constructs:
typedef void *VoidPtrFn(void *); // Function takes a void *, returning a void *
void *SomeOtherFn(void *) { ... }
VoidPtrFn *otherFnPtr = &SomeOtherFn;
VoidPtrFn **otherFnPtrPtr = &otherFnPtr;
What is the difference between void (*xmlHashScanner)(void *payload, void *data, xmlChar *name) and void *xmlHashScanner(void *payload, void *data, xmlChar *name) in C?
Why do they behave differently?
xmlHashScanner is a user-defined function in the library libxml2.
When I try to redefine this function with a little different prototype: void *xmlHashScanner instead of void (*xmlHashScanner) I have the following error:
error: ‘xmlHashScanner’ redeclared as different kind of symbol
void *xmlHashScanner(void *payload, void *data, xmlChar *name)
^
In file included from /usr/include/libxml2/libxml/parser.h:18:0,
from /home/solar/Bureau/parser/src/diam_dict.c:12:
/usr/include/libxml2/libxml/hash.h:88:16: note: previous declaration of ‘xmlHashScanner’ was here
typedef void (*xmlHashScanner)(void *payload, void *data, xmlChar *name);
I wonder what is the difference between two of them.
void (*xmlHashScanner)(...) is a prototype for a function pointer returning nothing (void) whereas void *xmlHashScanner(...) is a prototype for a function returning a void*
void (*xmlHashScanner)(void *payload, void *data, xmlChar *name)
declares xmlHashScanner as a pointer to a function returning void, whereas
void *xmlHashScanner(void *payload, void *data, xmlChar *name)
declares xmlHashScanner as a function returning a pointer to void.
In both declaration and expression syntax, the unary * operator has lower precedence than the postfix [] subscript and () function call operators, so
T *a[N]; // a is an N-element array of pointer to T
T (*a)[N]; // a is a pointer to an N-element array of T
T *f(); // f is a function returning a pointer to T
T (*f)(); // f is a pointer to a function returning T
So you are comparing this:
void *xmlHashScanner(void *payload, void *data, xmlChar *name);
with this:
typedef void (*xmlHashScanner)(void *payload, void *data, xmlChar *name);
The first is a declaration for a function, and the second one is a typedef (defines a type). A function and a type are vastly different things; they are used for different purposes:
A function can be called
A type can be used for making a variable, which can then participate in calculations
This is what your compiler means by "a different kind of symbol" - there is no context (unless by accident) where these pieces of code can do anything similar. Consider, for example, tho pieces of code - 42 and &&. They have nothing in common - they are two different kinds of things.
To understand the details of the declaration, which are a bit confusing, use the right-left rule, which is also implemented here.
typedef void (*xmlHashScanner)(whatever) means xmlHashScanner type is a pointer to a function with whatever arguments returning void
void *xmlHashScanner(whatever) means xmlHashScanner is a function with whatever arguments, which returns void *
I am writing a function that receives a pointer to a comparison function and an array of MyStructs and is supposed to sort the array according to the comparison function:
void myStructSort(
struct MyStruct *arr,
int size,
int (*comp)(const struct MyStruct *, const struct MyStruct *)) {
qsort(arr, size, sizeof(struct MyStruct), comp);
}
Unfortunately this doesn't compile because qsort expects the comparator to receive void * arguments and not const struct MyStruct *. I thought of several bad solutions and was wondering what the correct solution is.
Option 1
Cast comp to int (*)(const void *, const void*). This compiles but is undefined behavior (see this SO question).
Option 2
Create a global variable int (*global_comp)(const struct MyStruct *, const struct MyStruct *) and set global_comp=comp inside myStructSort. Then create a function:
int delegatingComp(const void *a, const void *b) {
return globalComp((const struct MyStruct *)a, (const struct MyStruct *)b);
}
And in myStructSort call qsort(arr, size, sizeof(struct MyStruct), delegatingComp). The problem with this is the icky global variable.
Option 3
Reimplement qsort. This is functionally safe but very bad practice.
Is there a magical perfect fourth option?
Edit
I can't change the API of myStructSort and I am compiling my code using gcc c99 -Wall -Wextra -Wvla.
Option 2 breaks thread-safety, so I wouldn't choose that one.
Option 3 is just plain wrong as you point out. There is no reason to re-implement quicksort and potentially make a mistake.
Option 1 is UB but it will work on any sane compiler. If you choose this option be sure to add a comment.
I would also consider:
Option 4. Redesign the interface of myStructSort to take int (*)(const void *, const void*) or scrap it entirely and call qsort directly. Basically send it back to the architecht, because he made a poor design choice.
following approach only works for gcc. It's a part of gnu extension. further please reference to https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Nested-Functions.html#Nested-Functions
first let's make sure the prototype of qsort is in such a form:
void qsort(void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *));
then you can:
void myStructSort(
struct MyStruct *arr,
int size,
int (*comp)(const struct MyStruct *, const struct MyStruct *)) {
int comparator(const void * a, const void *b) {
return comp((const struct MyStruct *)a, (const struct MyStruct *)b);
}
qsort(arr, size, sizeof *arr, comparator);
}
But again, since it uses gnu extension, don't expect too much portability.
ABOUT YOUR COMMENT: for modern gcc, gnu standard is default instead of iso ones. specifically, lastest gcc should use gnu11 standard. older ones are using gnu89. so, I don't know about your command line params, but if -std is not set, this will work.
following is an example taken from info gcc, just in case the link is dead. it shows a closure-like usage of nested function:
bar (int *array, int offset, int size)
{
int access (int *array, int index)
{ return array[index + offset]; }
int i;
/* ... */
for (i = 0; i < size; i++)
/* ... */ access (array, i) /* ... */
}
If you are using gcc, then you can use the qsort_r function in glibc since 2.8, which allows you to specify a comparator function with an additional user-supplied argument:
void qsort_r(void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *, void *),
void *arg);
This is not portable, of course, and it requires you to define the feature-test macro:
#define _GNU_SOURCE
(On FreeBSD -- and, presumably, Mac OS X -- there is a similar but incompatible qsort_r; the difference is that the user-supplied context argument is provided as the first argument to the comparison function, rather than the last argument.)
But if you have it, it allows you to avoid the global in option 2:
/* This struct avoids the issue of casting a function pointer to
* a void*, which is not guaranteed to work. It might not be
* necessary, but I know of no guarantees.
*/
typedef struct CompContainer {
int (*comp_func)(const struct MyStruct *, const struct MyStruct *);
} CompContainer;
int delegatingComp(const void *a, const void *b, void* comp) {
return ((CompContainer*)comp)->comp_func((const struct MyStruct *)a,
(const struct MyStruct *)b);
}
void myStructSort(
struct MyStruct *arr,
int size,
int (*comp_func)(const struct MyStruct *,
const struct MyStruct *)) {
const CompContainer comp = {comp_func};
qsort_r(arr, size, sizeof(struct MyStruct), delegatingComp, &comp);
}
(Live on ideone)
The correct approach is to cast from void const * to MyStruct const * in the comparison function.
This is well-defined for the first object, because the pointer that was passed to the comparison function was created by a cast from MyStruct const * to void const *, and casting a pointer to void back to its original type is allowed (and it's really the only thing that is).
For the other array members, it is assumed that casting void const * to char const *, adding the offset of the object, generated by multiplying the object size with the position of the object in the array, and casting that back to void const * will give a pointer that can be cast back to MyStruct const *.
That is a bold assumption, but usually works out. There may be corner cases where this doesn't work, but in general compilers pad any struct foo to a multiple of its alignment to ensure that array members' start addresses have a distance of sizeof(struct foo).
Casting function pointers is generally unsafe and needs to be avoided, as different data types may have different representations -- for example, a void * must be able to express every possible address as it could have been converted from a char *, while a MyStruct * is guaranteed to have a few of the least significant bits clear as any valid object would be aligned -- so it is entirely possible that the calling convention for these types could be different.
The only sane option is to re-write the interface you've created, or make a new one.
I've done something very similar with bubble sort on another answer of mine.
In short, with C, you want your sort function to be of the form:
void* bubbleSort(void* arr, int (*compareFcn)(void*, void*),
size_t sizeOfElement, size_t numElements)
And your comparison function to be of the form:
int compareFunction(void *a, void *b);
In order to simplify the development of future school assignments I decided to create an API (is that what you would call it?) for two data structures I commonly use -- a linked list and a hash table.
In developing each of these I ended up with the following two insert functions:
int list_insert(list *l, char *data, unsigned int idx);
int hash_insert(hash_table **ht, char *data);
The list_insert() function (and all of the list functions) ended up being pass-by-value since I never had any need to directly modify the list * itself unless I was malloc'ing or free'ing it. However, because I wanted to include auto-rehashing in my hash table I found that I had to pass the table by-reference instead of by-value in any function that might force a rehash. Now I end up with syntax like the following:
list_insert(l, "foo", 3);
hash_insert(&ht, "foo");
The difference strikes me as a little odd and I found myself wondering if I should change the list functions to be pass-by-reference as well for consistency's sake -- even though none of my functions would need to leverage it. What's the typical consensus here? Should I only pass-by-reference if my function actually needs to modify its arguments or should I pass-by-reference for the sake of consistency?
Structure definitions:
typedef struct list_node list_node;
struct list_node {
char *data;
list_node *next;
list_node *prev;
};
typedef struct list list;
struct list {
list_node *head;
list_node *tail;
size_t size;
};
typedef struct hash_table hash_table;
struct hash_table {
list **table;
size_t entries;
size_t buckets;
float maxLoad;
unsigned int (*hash)(char*, unsigned int);
};
List functions:
list *list_createList();
list_node *list_createNode();
void list_destroyList(list *l);
void list_destroyNode(list_node *n);
int list_append(list *l, char *data);
int list_insert(list *l, char *data, unsigned int idx);
int list_remove(list *l, char *data, int (*compar)(const void*, const void*));
void list_push(list *l, char *data);
char *list_pop(list *l);
int list_count(list *l, char *data, int (*compar)(const void*, const void*));
int list_reverse(list *l);
int list_sort(list *l, int (*compar)(const void*, const void*));
int list_print(list *l, void (*print)(char *data));
Hash functions:
hash_table *hash_createTable(size_t buckets, float maxLoad, unsigned int (*hash)(char*, unsigned int));
void hash_destroyTable(hash_table *ht);
list *hash_list(const hash_table **ht);
int hash_checkLoad(hash_table **ht);
int hash_rehash(hash_table **ht);
int hash_insert(hash_table **ht, char *data);
void hash_stats(hash_table *ht);
int hash_print(hash_table *ht, void (*print)(char*));
Here is a general rule of thumb:
pass by value if its typdef is a native type (char, short, int, long, long long, double or float)
pass by reference if it is a union, struct or array
Additional considerations for passing by reference:
use const if it will not be modified
use restrict if pointers will not point to the same address
Sometimes a struct/union seems like the appropriate type, but can be replaced with arrays if the types are similar. This can help with optimization (loop vectorization for example)
That's up to you and takes a little intuition. When passing large structs I pass by reference so that I am not eating up extra stack space and burning cycles copying the struct. But with small struts like yours it may be more efficient to use the stack depending on your target processor, how often you are using the values, and what your compiler does. Your compiler may break that struct up and put its values into registers.
But if you do pass by reference and do not intend to modify the value it is best practice to pass a pointer to const, eg: const list * l. That way there isn't any risk of you accidentally modifying the value and it makes the interface cleaner- now the caller knows that the value won't be changing.
Consistency is nice and I personally would lean in that direction especially on large interface because it may make things easier in the long run, but I would definitely use const. In doing so you allow the compiler to discover any accidental assignments so that later you don't need to track down a hard to bug.
See also: Passing a struct to a function in C
I have two functions, each taking a pointer to a different type:
void processA(A *);
void processB(B *);
Is there a function pointer type that would be able to hold a pointer to either function without casting?
I tried to use
typedef void(*processor_t)(void*);
processor_t Ps[] = {processA, processB};
but it didn't work (compiler complains about incompatible pointer initialization).
Edit: Another part of code would iterate through the entries of Ps, without knowing the types. This code would be passing a char* as a parameter. Like this:
Ps[i](data_pointers[j]);
Edit: Thanks everyone. In the end, I will probably use something like this:
void processA(void*);
void processB(void*);
typedef void(*processor_t)(void*);
processor_t Ps[] = {processA, processB};
...
void processA(void *arg)
{
A *data = arg;
...
}
If you typedef void (*processor_t)(); then this will compile in C. This is because an empty argument list leaves the number and types of arguments to a function unspecified, so this typedef just defines a type which is "pointer to function returning void, taking an unspecified number of arguments of unspecified type."
Edit: Incidentally, you don't need the ampersands in front of the function names in the initializer list. In C, a function name in that context decays to a pointer to the function.
It works if you cast them
processor_t Ps[] = {(processor_t)processA, (processor_t)processB};
By the way, if your code is ridden with this type of things and switch's all over the place to figure out which function you need to call, you might want to take a look at object oriented programming. I personally don't like it much (especially C++...), but it does make a good job removing this kind of code with virtual inheritance.
This can be done without casts by using a union:
typedef struct A A;
typedef struct B B;
void processA(A *);
void processB(B *);
typedef union { void (*A)(A *); void (*B)(B *); } U;
U Ps[] = { {.A = processA}, {.B = processB} };
int main(void)
{
Ps[0].A(0); // 0 used for example; normally you would supply a pointer to an A.
Ps[1].B(0); // 0 used for example; normally you would supply a pointer to a B.
return 0;
}
You must call the function using the correct member name; this method only allows you to store one pointer or the other in each array element, not to perform weird function aliasing.
Another alternative is to use proxy functions that do have the type needed when calling with a parameter that is a pointer to char and that call the actual function with its proper type:
typedef struct A A;
typedef struct B B;
void processA(A *);
void processB(B *);
typedef void (*processor_t)();
void processAproxy(char *A) { processA(A); }
void processBproxy(char *B) { processB(B); }
processor_t Ps[] = { processAproxy, processBproxy };
int main(void)
{
char *a = (char *) address of some A object;
char *b = (char *) address of some B object;
Ps[0](a);
Ps[1](b);
return 0;
}
I used char * above since you stated you are using it, but I would generally prefer void *.