I am writing a function that receives a pointer to a comparison function and an array of MyStructs and is supposed to sort the array according to the comparison function:
void myStructSort(
struct MyStruct *arr,
int size,
int (*comp)(const struct MyStruct *, const struct MyStruct *)) {
qsort(arr, size, sizeof(struct MyStruct), comp);
}
Unfortunately this doesn't compile because qsort expects the comparator to receive void * arguments and not const struct MyStruct *. I thought of several bad solutions and was wondering what the correct solution is.
Option 1
Cast comp to int (*)(const void *, const void*). This compiles but is undefined behavior (see this SO question).
Option 2
Create a global variable int (*global_comp)(const struct MyStruct *, const struct MyStruct *) and set global_comp=comp inside myStructSort. Then create a function:
int delegatingComp(const void *a, const void *b) {
return globalComp((const struct MyStruct *)a, (const struct MyStruct *)b);
}
And in myStructSort call qsort(arr, size, sizeof(struct MyStruct), delegatingComp). The problem with this is the icky global variable.
Option 3
Reimplement qsort. This is functionally safe but very bad practice.
Is there a magical perfect fourth option?
Edit
I can't change the API of myStructSort and I am compiling my code using gcc c99 -Wall -Wextra -Wvla.
Option 2 breaks thread-safety, so I wouldn't choose that one.
Option 3 is just plain wrong as you point out. There is no reason to re-implement quicksort and potentially make a mistake.
Option 1 is UB but it will work on any sane compiler. If you choose this option be sure to add a comment.
I would also consider:
Option 4. Redesign the interface of myStructSort to take int (*)(const void *, const void*) or scrap it entirely and call qsort directly. Basically send it back to the architecht, because he made a poor design choice.
following approach only works for gcc. It's a part of gnu extension. further please reference to https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Nested-Functions.html#Nested-Functions
first let's make sure the prototype of qsort is in such a form:
void qsort(void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *));
then you can:
void myStructSort(
struct MyStruct *arr,
int size,
int (*comp)(const struct MyStruct *, const struct MyStruct *)) {
int comparator(const void * a, const void *b) {
return comp((const struct MyStruct *)a, (const struct MyStruct *)b);
}
qsort(arr, size, sizeof *arr, comparator);
}
But again, since it uses gnu extension, don't expect too much portability.
ABOUT YOUR COMMENT: for modern gcc, gnu standard is default instead of iso ones. specifically, lastest gcc should use gnu11 standard. older ones are using gnu89. so, I don't know about your command line params, but if -std is not set, this will work.
following is an example taken from info gcc, just in case the link is dead. it shows a closure-like usage of nested function:
bar (int *array, int offset, int size)
{
int access (int *array, int index)
{ return array[index + offset]; }
int i;
/* ... */
for (i = 0; i < size; i++)
/* ... */ access (array, i) /* ... */
}
If you are using gcc, then you can use the qsort_r function in glibc since 2.8, which allows you to specify a comparator function with an additional user-supplied argument:
void qsort_r(void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *, void *),
void *arg);
This is not portable, of course, and it requires you to define the feature-test macro:
#define _GNU_SOURCE
(On FreeBSD -- and, presumably, Mac OS X -- there is a similar but incompatible qsort_r; the difference is that the user-supplied context argument is provided as the first argument to the comparison function, rather than the last argument.)
But if you have it, it allows you to avoid the global in option 2:
/* This struct avoids the issue of casting a function pointer to
* a void*, which is not guaranteed to work. It might not be
* necessary, but I know of no guarantees.
*/
typedef struct CompContainer {
int (*comp_func)(const struct MyStruct *, const struct MyStruct *);
} CompContainer;
int delegatingComp(const void *a, const void *b, void* comp) {
return ((CompContainer*)comp)->comp_func((const struct MyStruct *)a,
(const struct MyStruct *)b);
}
void myStructSort(
struct MyStruct *arr,
int size,
int (*comp_func)(const struct MyStruct *,
const struct MyStruct *)) {
const CompContainer comp = {comp_func};
qsort_r(arr, size, sizeof(struct MyStruct), delegatingComp, &comp);
}
(Live on ideone)
The correct approach is to cast from void const * to MyStruct const * in the comparison function.
This is well-defined for the first object, because the pointer that was passed to the comparison function was created by a cast from MyStruct const * to void const *, and casting a pointer to void back to its original type is allowed (and it's really the only thing that is).
For the other array members, it is assumed that casting void const * to char const *, adding the offset of the object, generated by multiplying the object size with the position of the object in the array, and casting that back to void const * will give a pointer that can be cast back to MyStruct const *.
That is a bold assumption, but usually works out. There may be corner cases where this doesn't work, but in general compilers pad any struct foo to a multiple of its alignment to ensure that array members' start addresses have a distance of sizeof(struct foo).
Casting function pointers is generally unsafe and needs to be avoided, as different data types may have different representations -- for example, a void * must be able to express every possible address as it could have been converted from a char *, while a MyStruct * is guaranteed to have a few of the least significant bits clear as any valid object would be aligned -- so it is entirely possible that the calling convention for these types could be different.
The only sane option is to re-write the interface you've created, or make a new one.
I've done something very similar with bubble sort on another answer of mine.
In short, with C, you want your sort function to be of the form:
void* bubbleSort(void* arr, int (*compareFcn)(void*, void*),
size_t sizeOfElement, size_t numElements)
And your comparison function to be of the form:
int compareFunction(void *a, void *b);
Related
I've got a structure:
typedef struct personalData
{
char name[20];
char * remarks;
int age;
float weight;
} personalData;
I need to sort data by weight using qsort(). Here's my weightSort function:
void weightSort(personalData * data[], int len)
{
qsort(data, len, sizeof(struct personalData *), structSortWeight);
}
where len = 10(calculated before using some different functions, but it probably doesn't matter), data[] is defined in main():
struct personalData * data[10];
and finally structSortWeight:
int structSortWeight(const void *a, const void *b)
{
personalData *p1 = (personalData *)a;
personalData *p2 = (personalData *)b;
return (p1->weight - p2->weight);
}
My program crashes when it starts sorting. I want to add that when I change the third argument in qsort() to sizeof(float) it doesn't crash but p1->weight and p2->weight point to some rubbish.
The calling function in main():
weightSort(data, len);
personalData * data[] has some data already assigned.
This array declaration:
struct personalData * data[10];
does not declare an object suitable for use as the first parameter to
void weightSort(personalData data[], int len)
The variable and function parameter have different levels of indirection. Your actual data is an array of pointers to struct personalData, whereas the function parameter is appropriate for an array of the structures themselves. This produces undefined behavior. Probably more functionally relevant, the item size that you pass to qsort() is therefore incorrect: with the data as declared in main(), you want as item size not the size of struct personalData, but rather the size of a pointer to one (sizeof(struct personalData *)).
Furthermore, your comparison function is wrong. In the first place, it must return an int, not a float, but in the second place, since the elements you are sorting are pointers to structures, the arguments presented to the comparison function will be pointers to such pointers. You treat them instead as pointers directly to structures.
Your comparison function receives pointers to two elements of the list, each of which is also a pointer. So the real type of each parameter is personalData **, but you cast them to personalData *. So you're treating a pointer as a structure instance, which is why you're seeing garbage.
You need to add an additional level of indirection in your comparison function:
int structSortWeight(const void *a, const void *b)
{
// no need to cast from void *
const personalData **p1 = a;
const personalData **p2 = b;
return ((*p1)->weight - (*p2)->weight);
}
The signature for the qsort comparison function is
int (*comp)(const void *, const void *)
Returning a float does not work at all.
return (p1->weight - p2->weight); is not a suitable compare. This subtracts 2 float values and converts to int. The compare function must return the sensible consistent results.
Consider weights A,B,C: 1.1, 2.0, 2.9.
Comparing f(A,B) returns int 0.
Comparing f(B,C) returns int 0.
Comparing f(A,C) returns int 1. // This does not make sense if A==B and B==C
This inconsistency can fool qsort() resulting in undefined behavior (UB).
A better compare function
int structSortWeight2(const void *a, const void *b) {
const personalData **p1 = (const personalData **)a;
const personalData **p2 = (const personalData **)b;
// 2 compares are done, each returning an `int`
return ((*p1)->weight > (*p2)->weight) - ((*p1)->weight < (*p2)->weight);
}
Code has other issues as detailed in #John Bollinger
I have a struct which looks somewhat like this:
struct Data
{
int a;
float b;
char *c;
int (*read)(struct Data *data, int arg1);
int (*write)(struct Data *data, int arg1, int arg2);
int (*update)(struct Data *data, int arg1, int arg2, int arg3);
int (*erase)(struct Data *data, int arg1);
/* ... */
}
The ... means that there is bunch of other function pointers smiliar to above (that is, they all return an int and take pointer to Data as first argument, but other arguments may differ).
Let's say there are 20 function pointers total. In a special function DataInit(), I assign functions to them, like this:
Data->read = readA;
Data->write = writeA;
/* readA() and writeA() are functions defined elsewhere in the code, with argument lists same as corresponding function pointers */
Now I have to do the same for another object of type Data, which differs in a way that it's "read-only"; it basically means that from those 20 function pointers 15 has to be assigned such that after invoking them they should return error code NOT_SUPPORTED. The rest stay the same (for example, readA() is assigned to function pointer (*read) like above).
I was wondering if there's a way to do it without implementing a function for each pointer (for example, updateB() that takes three arguments and its body is just return NOT_SUPPORTED). Unfortunately, I cannot just set them to NULL.
I was thinking about preprocessor macros but it's black magic to me, honestly.
No, you may not cast a function pointer to a function pointer of different type (or even worse, to a different pointer type). This causes undefined behavior in the C standard for a good reason.
There are currently architectures out there where this isn't just a theoretical problem that everyone gets away with, but it can actually crash your program in unexpected ways. Read this blog post if you want details.
I don't know whether my suggestion is legal or not, but I want to suggest this:
int data_not_supported_(struct Data *thiz, ...)
{
return NOT_SUPPORTED;
}
And there might be no problem if your compiler uses cdecl calling convention, where the number of argument doesn't affect on the caller.
Yes, you can use a single function
int unsupported() {
return NOT_SUPPORTED;
}
and cast to correct the function pointer type when initializing your struct:
Data->write = (int (*)(struct Data *, int, int))unsupported;
These casts are ugly, so it's more readable to have a typedef for each function:
typedef int
(*write_t)(struct Data *, int, int);
And then:
Data->write = (write_t)unsupported;
As mentioned, function pointer casts will most likely result in undefined behavior on most systems.
A feasible solution to the problem is this:
typedef int func_t (struct Data* this, void* arg);
struct Data
{
int a;
float b;
char* c;
func_t* read;
func_t* write;
...
};
// later on in the code:
int update_function (struct Data* this, void* arg)
{
struct my_type* m = (struct my_type*)arg;
// use m
}
I am implementing a generic singly linked list where list nodes store a pointer to their data.
typedef struct sll_node
{
void *data;
struct sll_node *next;
} sll_node;
To implement a generic find subroutine that works with any kind of data, I wrote it so that it takes as an argument a function pointer to the comparison function as follows:
/* eq() must take 2 arguments. ex: strcmp(char *, char *) */
sll_node *sll_find(void *data, int (*eq)(), sll_node *root);
You can pass the appropriate function pointer that works with the data type at hand.. So if you store strings in the list nodes, you can pass strcmp as the eq() function, and so on. It works but I'm still not satisfied..
Is there a way to explicitly specify the number of comparison function parameters without giving up its generality?
I tried this at first:
sll_node *sll_find(void *data, int (*eq)(void *, void *), sll_node *root);
I expected it to work. But no (edit: it compiles with a warning but I have -Werror on!), I had to write a wrapper function around strcmp to make it conform to the eq prototype.
I then tried:
sll_node *sll_find(void *data, int (*eq)(a, b), sll_node *root);
or:
typedef int (*equality_fn)(a, b);
sll_node *sll_find(void *data, equality_fn eq, sll_node *root);
which both wouldn't compile since: "a parameter list without types is only allowed in a function definition"
To use strcmp without a wrapper or a cast, the declaration needs to be
sll_node *findNode(void *data, int (*eq)(const char *, const char *), sll_node *root);
On the other hand, if you declare the args as const void *, then you can avoid the wrapper by casting strcmp to the appropriate type.
Method 1: direct cast, messy but effective
result = findNode( "hello", (int(*)(const void *, const void *))strcmp, root );
Method 2: typedef the comparison function, and then use it to cast
typedef int (*cmpfunc)(const void *, const void *);
result = findNode( "world", (cmpfunc)strcmp, root );
Edit: After reading this post that #WilburVandrsmith linked, I've decided to leave this answer as is. I leave it up to the reader to decide whether the proposed cast violates the following paragraph from the specification:
If a converted pointer is used to call a function whose type is not
compatible with the pointed-to type, the behavior is undefined.
Compatible or not compatible, that is the question, you decide.
Your last attempted solution is the closest to being correct. The parameters in your defined-type function pointer need to be declared with their data types, just like you would with a regular function declaration, like so:
typedef int (*equality_fn)(char *a, char *b);
sll_node *sll_find(void *data, equality_fn eq, sll_node *root);
UPDATE
To make it more generic use void pointers, and then type cast the passed void pointers to the needed data type in the matching function definition for equality_fn:
typedef int (*equality_fn)(void *a, void *b);
sll_node *sll_find(void *data, equality_fn eq, sll_node *root);
Something else important to remember is that a pointer is a pointer is a pointer, regardless of what it's pointing at or how it was originally defined. So, you can have some function pointer, or a void pointer, or a pointer to a byte, a char, an int--anything--as long as you handle it properly in your code and cast it back to a valid type before attempting to use it.
Something else that most coders don't take much advantage of in C is that function names themselves are really just addresses that are called at run-time, and so they are also pointers. ;)
My solution to this conundrum would be (avoiding pointer typedefs, incidentally):
typedef int equality_fn(const void *a, const void *b);
sll_node *sll_find(void *data, equality_fn *eq, sll_node *root);
Then make all your comparators be of type equality_fn. If you need to actually have a function then so be it:
equality_fn eq_strcmp; // a prototype
// ...
int eq_strcmp(const void *a, const void *b) { return strcmp(a, b); }
Gain lots of type safety in exchange for a potential picosocopic runtime penalty - which end of this trade you want to be on depends on your application.
In order to simplify the development of future school assignments I decided to create an API (is that what you would call it?) for two data structures I commonly use -- a linked list and a hash table.
In developing each of these I ended up with the following two insert functions:
int list_insert(list *l, char *data, unsigned int idx);
int hash_insert(hash_table **ht, char *data);
The list_insert() function (and all of the list functions) ended up being pass-by-value since I never had any need to directly modify the list * itself unless I was malloc'ing or free'ing it. However, because I wanted to include auto-rehashing in my hash table I found that I had to pass the table by-reference instead of by-value in any function that might force a rehash. Now I end up with syntax like the following:
list_insert(l, "foo", 3);
hash_insert(&ht, "foo");
The difference strikes me as a little odd and I found myself wondering if I should change the list functions to be pass-by-reference as well for consistency's sake -- even though none of my functions would need to leverage it. What's the typical consensus here? Should I only pass-by-reference if my function actually needs to modify its arguments or should I pass-by-reference for the sake of consistency?
Structure definitions:
typedef struct list_node list_node;
struct list_node {
char *data;
list_node *next;
list_node *prev;
};
typedef struct list list;
struct list {
list_node *head;
list_node *tail;
size_t size;
};
typedef struct hash_table hash_table;
struct hash_table {
list **table;
size_t entries;
size_t buckets;
float maxLoad;
unsigned int (*hash)(char*, unsigned int);
};
List functions:
list *list_createList();
list_node *list_createNode();
void list_destroyList(list *l);
void list_destroyNode(list_node *n);
int list_append(list *l, char *data);
int list_insert(list *l, char *data, unsigned int idx);
int list_remove(list *l, char *data, int (*compar)(const void*, const void*));
void list_push(list *l, char *data);
char *list_pop(list *l);
int list_count(list *l, char *data, int (*compar)(const void*, const void*));
int list_reverse(list *l);
int list_sort(list *l, int (*compar)(const void*, const void*));
int list_print(list *l, void (*print)(char *data));
Hash functions:
hash_table *hash_createTable(size_t buckets, float maxLoad, unsigned int (*hash)(char*, unsigned int));
void hash_destroyTable(hash_table *ht);
list *hash_list(const hash_table **ht);
int hash_checkLoad(hash_table **ht);
int hash_rehash(hash_table **ht);
int hash_insert(hash_table **ht, char *data);
void hash_stats(hash_table *ht);
int hash_print(hash_table *ht, void (*print)(char*));
Here is a general rule of thumb:
pass by value if its typdef is a native type (char, short, int, long, long long, double or float)
pass by reference if it is a union, struct or array
Additional considerations for passing by reference:
use const if it will not be modified
use restrict if pointers will not point to the same address
Sometimes a struct/union seems like the appropriate type, but can be replaced with arrays if the types are similar. This can help with optimization (loop vectorization for example)
That's up to you and takes a little intuition. When passing large structs I pass by reference so that I am not eating up extra stack space and burning cycles copying the struct. But with small struts like yours it may be more efficient to use the stack depending on your target processor, how often you are using the values, and what your compiler does. Your compiler may break that struct up and put its values into registers.
But if you do pass by reference and do not intend to modify the value it is best practice to pass a pointer to const, eg: const list * l. That way there isn't any risk of you accidentally modifying the value and it makes the interface cleaner- now the caller knows that the value won't be changing.
Consistency is nice and I personally would lean in that direction especially on large interface because it may make things easier in the long run, but I would definitely use const. In doing so you allow the compiler to discover any accidental assignments so that later you don't need to track down a hard to bug.
See also: Passing a struct to a function in C
I have two functions, each taking a pointer to a different type:
void processA(A *);
void processB(B *);
Is there a function pointer type that would be able to hold a pointer to either function without casting?
I tried to use
typedef void(*processor_t)(void*);
processor_t Ps[] = {processA, processB};
but it didn't work (compiler complains about incompatible pointer initialization).
Edit: Another part of code would iterate through the entries of Ps, without knowing the types. This code would be passing a char* as a parameter. Like this:
Ps[i](data_pointers[j]);
Edit: Thanks everyone. In the end, I will probably use something like this:
void processA(void*);
void processB(void*);
typedef void(*processor_t)(void*);
processor_t Ps[] = {processA, processB};
...
void processA(void *arg)
{
A *data = arg;
...
}
If you typedef void (*processor_t)(); then this will compile in C. This is because an empty argument list leaves the number and types of arguments to a function unspecified, so this typedef just defines a type which is "pointer to function returning void, taking an unspecified number of arguments of unspecified type."
Edit: Incidentally, you don't need the ampersands in front of the function names in the initializer list. In C, a function name in that context decays to a pointer to the function.
It works if you cast them
processor_t Ps[] = {(processor_t)processA, (processor_t)processB};
By the way, if your code is ridden with this type of things and switch's all over the place to figure out which function you need to call, you might want to take a look at object oriented programming. I personally don't like it much (especially C++...), but it does make a good job removing this kind of code with virtual inheritance.
This can be done without casts by using a union:
typedef struct A A;
typedef struct B B;
void processA(A *);
void processB(B *);
typedef union { void (*A)(A *); void (*B)(B *); } U;
U Ps[] = { {.A = processA}, {.B = processB} };
int main(void)
{
Ps[0].A(0); // 0 used for example; normally you would supply a pointer to an A.
Ps[1].B(0); // 0 used for example; normally you would supply a pointer to a B.
return 0;
}
You must call the function using the correct member name; this method only allows you to store one pointer or the other in each array element, not to perform weird function aliasing.
Another alternative is to use proxy functions that do have the type needed when calling with a parameter that is a pointer to char and that call the actual function with its proper type:
typedef struct A A;
typedef struct B B;
void processA(A *);
void processB(B *);
typedef void (*processor_t)();
void processAproxy(char *A) { processA(A); }
void processBproxy(char *B) { processB(B); }
processor_t Ps[] = { processAproxy, processBproxy };
int main(void)
{
char *a = (char *) address of some A object;
char *b = (char *) address of some B object;
Ps[0](a);
Ps[1](b);
return 0;
}
I used char * above since you stated you are using it, but I would generally prefer void *.