Confused about Qsort and Pointers

Confused about Qsort and Pointers - c

I am a beginner programmer in C who wants to get used to terminology and pointers.
I have found the following working function prototype while searching for a way to sort the elements of a numerical array. The function was qsort and it utilized pointers. Now what I understood is that the word "const" ensures that the values a and b are unchanged but not the pointers. Correct me if I am wrong here. My questions are:
Why do we use void * the function can we not use int * from the
start?
How does the construction *(int*)a in the return part
work?
Why does the qsort algorithm needs this many arguments?
int compare (const void *a, const void *b)
{
return ( *(int*)a - *(int*)b );
}
Many thanks for the answers.
PS: That is a pretty complicated task for me.

qsort was made this way so it could be used as a generic sorter. If it would use int from the start it could only be used to compare integers. This way you could also, for example, sort strings by passing strcmp as the compare function to qsort.
*(int*)a casts a to a pointer-to-int and then dereferences it, so you get the integer stored at a. Note that this doesn't change a or the value that a points to.
qsort requires 4 arguments: the array to sort, the number of elements in that array and the size of the elements and finally the compare function. It needs all this information, because again, it is made to be as generic as possible.
It needs the number of elements because in C pointers don't carry information about the size of the buffer that follows them. And it needs to know the size of each element so it can pass the elements to the compare function correctly. For examle, to compare ints you would pass sizeof(int) as the size parameter. To compare strings you would use sizeof(char *).
ADDIT as suggested by H2CO3 the reason for using const void * is to indicate that the compare function may not change the value pointed to by a and b. This, of course, to ensure that sorting the array doesn't suddenly change the values in the array. And, as H2CO3 said, it would be cleaner to cast to (const int *) so that you cannot accidentally change the value after casting it:
return *(const int *)a - *(const int *)b;
You could also get rid of the cast with:
int compare(const void * a, const void * b){
const int * ia = a;
const int * ib = b;
return *ia - *ib;
}
depending on your tastes regarding casts. (I prefer to avoid them)
Finally, to clarify the asterisks:
*(int *)a
^ ^
| └ cast to integer pointer
└ dereference (integer) pointer

Now what I understood is that the word "const" ensures that the values a and b are unchanged but not the pointers
You understood wrong.
const int *a;
declare a as pointer to constant int type. This means that the word const ensures that you can't modify the value of the variable a points to by modifying *a.
Why do we use void * the function can we not use int * from the start?
void * is used to point any type of variable.
How does the construction *(int*)a in the return part work?
*(int *) is used to cast a as a pointer to int and then dereferencing it to get the value stored at location it points to.

The other answers are excellent. I just want to add that it's ofter easier to read if you are very clear in your callback function.
int compare (const void *a, const void *b)
{
return ( *(int*)a - *(int*)b );
}
becomes
int compare (const void *a, const void *b)
{
int ia = *(int *)a;
int ib = *(int *)b;
return ia - ib;
}
In this case it's not too important but as your compare funcion gets complex, you may want to get your variables to "your type" before doing the compare.
Since you asked in the comment below, here is a very step by step version:
int compare (const void *a, const void *b)
{
int *pa = (int *)a;
int *pb = (int *)b;
int ia = *pa;
int ib = *pb;
return ia - ib;
}

qsort() function is an example of a generic algorithm that was implemented as a general-purpose routine. The idea is to make it useful for sorting arbitrary objects, not just int or float. Because of that (and because of the C language design), qsort() resorts to taking a comparison function as a parameter that accepts two generic (in C sense) pointers. It is up to that function (provided by qsort() user) to cast these pointers to correct type, perform correct comparison and return an indication of ordering.
Similarly, since qsort() doesn't know beforehand how large objects are, it takes the object size as a parameter. As far as qsort() is concerned, the objects are blobs of bytes of equal size contiguously arranged in memory.
Finally, since neither of the operations qsort() performs can cause an error, it doesn't return an error code. Actually there is a situation where qsort() might fail, which is illegal parameters passed to it, but in a tradition of many other standard C library routines, it does not guarantee any error checking on parameters promising undefined behavior in such a case.

Related

C function pointers

I am learning C from "C by K&R". I was going through Function pointers section.There was an example to sort an array of strings using function pointers and void pointers.(to be specific,on page 100). I have a fair understanding of function pointers and void pointers.
The example given there calls
qsort((void**) lineptr, 0, nlines-1,(int (*)(void*,void*))(numeric ? numcmp : strcmp));
And it seemlessly uses void ptr,like as below to compare and swap.
I understand that it takes array of pointer and each element by itself is a void pointer to the string. How is it possible to compare,swap a void ptr with another.
void sort(void *v[],int i,int j)
{
id *temp;
temp = v[i];
v[i] = v[j];
v[j] = temp;
}
Can anyone explain the concept behind this.

How is it possible to compare, swap a void ptr with another?
Compare: comparing a void ptr with each other is meaningless, as their values are addresses.
Swap: A pointer is a variable holding an address. By changing a pointer's value you change the address it points to. Data itself is not even considered here.
Note: void pointers does not interpret the data they are pointing to. That is why you need explicit type conversion when you dereference them, such that there is a correspondence between the data they are pointing to and the variable this data is assign to.

Remember that pointers are just variables that store a memory address. If there's not any conflict between types I can't see why this shouldn't be possible!
The only difference between a void ptr and another is that you must pay attention only during the dereference (you need a cast to complete it)
For example:
void *ptr;
int m, n;
ptr = &n;
m = *((int *) ptr);
Anyway, ignoring this particular, you can work with void pointer normally.. You can, as your code shows, for example swap them just as they were int or other types variables

The function pointer required by qsort() has the following type
int (*compar)(const void *, const void *);
it means, that you can pass pointers of any type to this function since in c void * is converted to any poitner type without a cast.
Inside a comparision funcion, you MUST "cast"1 the void * poitners in order to be able to dereference them. Because a void * pointer cannot be dereferenced.
Swaping pointers is the correct way to sort an array of poitners, just like swaping integers would be the way to sort an array of integers. The other way, with an array of strings for example, would be to copy the string to a temporary buffer and perform a swap in terms of copying the data, and I think there is no need to explain why this is bad.
1
When I say cast I don't mean that you need to "cast", just convert to the appropriate poitner type. For example:
int compare_integers(const void *const x, const void *const y)
{
int *X;
int *Y;
X = x;
Y = y;
return (*X - *Y);
}
although it's of course possible to write return (*((int *) x) - *((int *) y)).

In this type of situation, it's often helpful to typedef to gain a better understanding. For illustration purposes, you could do
typedef void* address; //to emphasize that a variable of type void* stores an address
Now your swap function looks less daunting,
void swap(address v[],int i,int j) //takes an array of addresses v
{
address temp;
temp = v[i];
v[i] = v[j];
v[j] = temp;
}
A void *, however, contains no information regarding the type of object it points to. So before dereferencing it, you need to cast it to the right type, which is what strcmp and numcmp do, e.g.,
int strcmp(address a1, address a2) { //assumes a1 and a2 store addresses of strings
char *s1 = a1;
char *s2 = a2;
//s1 and s2 can be dereferenced and the strings they point to can be compared
}

Function comparing integers in C (pointers)

Here's the following function which is supposed to compare the values of two integers a and b and return a positive number if a>b and a negative number otherwise:
int int_cmp(const void *a, const void *b)
{
const int *ia = (const int*)a;
const int *ib = (const int*)b;
return *ia - *ib;
}
I am not too familiar with constant pointers (or pointers to constants) and I do not really understand the reasoning behind the function above. I would appreciate it if someone could provide a step-by-step explanation.

suppose, in the caller function, you have two int variables,
int p = 10;
int q = 5;
now , from your main() you are calling int_cmp(&p, &q); to compare their values.
in the receiving function int_cmp() the parameters are made const so that inside the int_cmp() function, the values of int p and int q should not be changed. If the values of a and/or b is changed in the int_cmp(), they will be changed in the main() also, as they have been passed using reference. so, to keep the values unchanged, the const is used.
Next, once the parameters are received in int_cmp(), they are typecasted to int as the arithmetic operators can be safely allowed on pointers of defined variable type.
I hope the atithmatic part is quite straightforward. It is de-referencing the pointers and calculating the difference between the values of the pointers a and b and returning the value of the difference.

I'm guessing this method is used in more general callback that expects a function pointer of the following type
int (*)(const void*, const void*)
This is the only reason I can see to use const void* here instead of const int*.
The reason for const is that comparison should be an operation that reads data only. It shouldn't have any need to mutate the parameters in order to compare them. Hence the standard definition of compare takes const data to encourage implementers to have the correct behavior

Passing dynamically allocated array as a parameter in C

So... I have a dynamically allocated array on my main:
int main()
{
int *array;
int len;
array = (int *) malloc(len * sizeof(int));
...
return EXIT_SUCCESS;
}
I also wanna build a function that does something with this dynamically allocated array.
So far my function is:
void myFunction(int array[], ...)
{
array[position] = value;
}
If I declare it as:
void myFunction(int *array, ...);
Will I still be able to do:
array[position] = value;
Or I will have to do:
*array[position] = value;
...?
Also, if I am working with a dynamically allocated matrix, which one is the correct way to declare the function prototype:
void myFunction(int matrix[][], ...);
Or
void myFunction(int **matrix, ...);
...?

If I declare it as:
void myFunction(int *array, ...);
Will I still be able to do:
array[position] = value;
Yes - this is legal syntax.
Also, if I am working with a dynamically allocated matrix, which one
is correct to declare the function prototype:
void myFunction(int matrix[][], ...);
Or
void myFunction(int **matrix, ...);
...?
If you're working with more than one dimension, you'll have to declare the size of all but the first dimension in the function declaration, like so:
void myFunction(int matrix[][100], ...);
This syntax won't do what you think it does:
void myFunction(int **matrix, ...);
matrix[i][j] = ...
This declares a parameter named matrix that is a pointer to a pointer to int; attempting to dereference using matrix[i][j] will likely cause a segmentation fault.
This is one of the many difficulties of working with a multi-dimensional array in C.
Here is a helpful SO question addressing this topic:
Define a matrix and pass it to a function in C

Yes, please use array[position], even if the parameter type is int *array. The alternative you gave (*array[position]) is actually invalid in this case since the [] operator takes precedence over the * operator, making it equivalent to *(array[position]) which is trying to dereference the value of a[position], not it's address.
It gets a little more complicated for multi-dimensional arrays but you can do it:
int m = 10, n = 5;
int matrixOnStack[m][n];
matrixOnStack[0][0] = 0; // OK
matrixOnStack[m-1][n-1] = 0; // OK
// matrixOnStack[10][5] = 0; // Not OK. Compiler may not complain
// but nearby data structures might.
int (*matrixInHeap)[n] = malloc(sizeof(int[m][n]));
matrixInHeap[0][0] = 0; // OK
matrixInHeap[m-1][n-1] = 0; // OK
// matrixInHeap[10][5] = 0; // Not OK. coloring outside the lines again.
The way the matrixInHeap declaration should be interpreted is that the 'thing' pointed to by matrixInHeap is an array of n int values, so sizeof(*matrixInHeap) == n * sizeof(int), or the size of an entire row in the matrix. matrixInHeap[2][4] works because matrixInHeap[2] is advancing the address matrixInHeap by 2 * sizeof(*matrixInHeap), which skips two full rows of n integers, resulting in the address of the 3rd row, and then the final [4] selects the fifth element from the third row. (remember that array indices start at 0 and not 1)
You can use the same type when pointing to normal multidimensional c-arrays, (assuming you already know the size):
int (*matrixPointer)[n] = matrixOnStack || matrixInHeap;
Now lets say you want to have a function that takes one of these variably sized matrices as a parameter. When the variables were declared earlier the type had some information about the size (both dimensions in the stack example, and the last dimension n in the heap example). So the parameter type in the function definition is going to need that n value, which we can actually do, as long as we include it as a separate parameter, defining the function like this:
void fillWithZeros(int m, int n, int (*matrix)[n]) {
for (int i = 0; i < m; ++i)
for (int j = 0; j < n; ++j)
matrix[i][j] = 0;
}
If we don't need the m value inside the function, we could leave it out entirely, just as long as we keep n:
bool isZeroAtLocation(int n, int (*matrix)[n], int i, int j) {
return matrix[i][j] == 0;
}
And then we just include the size when calling the functions:
fillWithZeros(m, n, matrixPointer);
assert(isZeroAtLocation(n, matrixPointer, 0, 0));
It may feel a little like we're doing the compilers work for it, especially in cases where we don't use n inside the function body at all (or only as a parameter to similar functions), but at least it works.
One last point regarding readability: using malloc(sizeof(int[len])) is equivalent to malloc(len * sizeof(int)) (and anybody who tells you otherwise doesn't understand structure padding in c) but the first way of writing it makes it obvious to the reader that we are talking about an array. The same goes for malloc(sizeof(int[m][n])) and malloc(m * n * sizeof(int)).

Will I still be able to do:
array[position] = value;
Yes, because the index operator p[i] is 100% identical to *(ptr + i). You can in fact write 5[array] instead of array[5] and it will still work. In C arrays are actually just pointers. The only thing that makes an array definition different from a pointer is, that if you take a sizeof of a "true" array identifier, it gives you the actual storage size allocates, while taking the sizeof of a pointer will just give you the size of the pointer, which is usually the system's integer size (can be different though).
Also, if I am working with a dynamically allocated matrix, which one is the correct way to declare the function prototype: (…)
Neither of them because those are arrays of pointers to arrays, which can be non-contigous. For performance reasons you want matrices to be contiguous. So you just write
void foo(int matrix[])
and internally calculate the right offset, like
matrix[width*j + i]
Note that writing this using the bracket syntax looks weird. Also take note that if you take the sizeof of an pointer or an "array of unspecified length" function parameter you'll get the size of a pointer.

No, you'd just keep using array[position] = value.
In the end, there's no real difference whether you're declaring a parameter as int *something or int something[]. Both will work, because an array definition is just some hidden pointer math.
However, there's is one difference regarding how code can be understood:
int array[] always denotes an array (it might be just one element long though).
int *pointer however could be a pointer to a single integer or a whole array of integers.
As far as addressing/representation goes: pointer == array == &array[0]
If you're working with multiple dimensions, things are a little bit different, because C forces you declare the last dimension, if you're defining multidimensional arrays explicitly:
int **myStuff1; // valid
int *myStuff2[]; // valid
int myStuff3[][]; // invalid
int myStuff4[][5]; // valid

What does the declaration void** mean in the C language?

I'm beginning to learn C and read following code:
public void** list_to_array(List* thiz){
int size = list_size(thiz);
void **array = malloc2(sizeof(void *) * size);
int i=0;
list_rewind(thiz);
for(i=0; i<size; i++){
array[i] = list_next(thiz);
}
list_rewind(thiz);
return array;
}
I don't understand the meaning of void**. Could someone explain it with some examples?

void** is a pointer to a pointer to void (unspecified type). It means that the variable (memory location) contains an address to a memory location, that contains an address to another memory location, and what is stored there is not specified. In this question's case it is a pointer to an array of void* pointers.
Sidenote: A void pointer can't be dereferenced, but a void** can.
void *a[100];
void **aa = a;
By doing this one should be able to do e.g. aa[17] to get at the 18th element of the array a.
To understand such declarations you can use this tool and might as well check a related question or two.

void** is a pointer to void*, or a pointer to a void pointer if you prefer!
This notation is traditionally used in C to implement a matrix, for example. So, in the matrix case, that would be a pointer to an array of pointers.

Normally void * pointers are used to denote a pointer to an unknown data type. In this case your function returns an array of such pointers thus the double star.
In C, a pointer is often used to reference an array. Eg the following assignment is perfectly legal:
char str1[10];
char *str2 = str1;
Now when void is used, it means that instead of char you have a variable of unknown type.
Pointers to an unknown data type are useful for writing generic algorithms. Eg. the qsort function in standard C library is defined as:
void qsort ( void * base,
size_t num,
size_t size,
int ( * comparator )
( const void *, const void * ) );
The sorting algorithm itself is generic, but has no knowledge of the contents of the data. Thus the user has to provide an implementation of a comparator that can deal with it. The algorithm will call the comparator with two pointers to the elements to be compared. These pointers are of void * type, because there is now information about the type of data being sorted.
Take a look at this thread for more examples
http://forums.fedoraforum.org/showthread.php?t=138213

void pointers are used to hold address of any data type. void** means pointer to void pointer. Void pointers are used in a place where we want a function should receive different types of data as function argument. Please check the below example
void func_for_int(void *int_arg)
{
int *ptr = (int *)int_arg;
//some code
}
void func_for_char(void *char_arg)
{
char *ptr = (char *)char_arg;
//some code
}
int common_func(void * arg, void (*func)(void *arg))
{
func(arg);
}
int main()
{
int a = 10;
char b = 5;
common_func((void *)&a, func_for_int);
common_func((void *)&b, func_for_char);
return 0;
}

What does this C syntax mean?

This is from a 'magic' array library that I'm using.
void
sort(magic_list *l, int (*compare)(const void **a, const void **b))
{
qsort(l->list, l->num_used, sizeof(void*),
(int (*)(const void *,const void *))compare);
}
My question is: what on earth is the last argument to qsort doing?
(int (*)(const void *, const void*))compare)
qsort takes int (*comp_fn)(const void *,const void *) as it's comparator argument, but this sort function takes a comparator with double pointers. Somehow, the line above converts the double pointer version to a single pointer version. Can someone help explain?

That's exactly what the cast you quoted does: it converts a pointer of type
int (*)(const void **, const void **)
to a pointer of type
int (*)(const void *, const void *)
The latter is what is expected by qsort.
Thing like this are encountered rather often in bad quality code. For example, when someone wants to sort an array of ints, they often write a comparison function that accepts pointers to int *
int compare_ints(const int *a, const int *b) {
return (*a > *b) - (*a < *b);
}
and when the time comes to actually call qsort they forcefully cast it to the proper type to suppress the compiler's complaints
qsort(array, n, sizeof *array, (int (*)(const void *,const void *)) compare_ints);
This is a "hack", which leads to undefined behavior. It is, obviously, a bad practice. What you see in your example is just a less direct version of the same "hack".
The proper approach in such cases would be to declare the comparison function as
int compare_ints(const void *a, const void *b) {
int a = *(const int *) a;
int b = *(const int *) b;
return (a > b) - (a < b);
}
and then use it without any casts
qsort(array, n, sizeof *array, compare_ints);
In general, if one expects their comparison functions to be used as comparators in qsort (and similar functions), one should implemnent them with const void * parameters.

The last argument to qsort is casting a function pointer taking double pointers, to one taking single pointers that qsort will accept. It's simply a cast.

On most hardware you can assume that pointers all look the same at the hardware level. For example, in a system with flat 64bit addressing pointers will always be a 64bit integer quantity. The same is true of pointers to pointers or pointers to pointers to pointers to pointers.
Therefore, whatever method is used to invoke a function with two pointers will work with any function that takes two pointers. The specific type of the pointers doesn't matter.
qsort treats pointers generically, as though each is opaque. So it doesn't know or care how they're dereferenced. It knows what order they're currently in and uses the compare argument to work out what order they should be in.
The library you're using presumably keeps lists of pointers to pointers about. It has a compare function that can compare two pointers to pointers. So it casts that across to pass to qsort. It's just syntactically nicer than, e.g.
qsort(l->list, l->num_used, sizeof(void*), compare);
/* elsewhere */
int compare(const void *ptr1, const void *ptr2)
{
// these are really pointers to pointers, so cast them across
const void **real_ptr1 = (const void **)ptr1;
const void **real_ptr2 = (const void **)ptr2;
// do whatever with real_ptr1 and 2 here, e.g.
return (*real_ptr2)->sort_key - (*real_ptr1)->sort_key;
}

It is casting a function pointer. I imagine that the reason is so that compare can be applied to the pointers that are dereferenced rather than whatever they are pointing to.

(int (*)(const void *,const void *))compare is a C style cast to cast the function pointer compare to a function pointer with two const void * args.

The last argument is a function pointer. It specifies that it takes a pointer to a function that returns an int and takes two const void ** arguments.