From the man page of qsort, in an example of sorting strings:
static int
cmpstringp(const void *p1, const void *p2)
{
/* The actual arguments to this function are "pointers to
pointers to char", but strcmp(3) arguments are "pointers
to char", hence the following cast plus dereference */
return strcmp(* (char * const *) p1, * (char * const *) p2);
}
Why is it necessary to have char * const * in the arguments to strcmp()? Isn't char * enough?
strcmp is declared as
int strcmp(
const char *string1,
const char *string2
);
This properly expresses the function's interface contract - which is that strcmp will not modify its input data - and allows the compiler to optimize inside the function (assuming it were not part of the CRT, and likely in assembler already).
const void* p1 says that whatever p1 points at is not changed by this function. If you did
char** p1_copy = (char**) p1;
that would be a setup to potentially break that promise, because you could then do
*p1_copy = "Something else";
So a cast from const void* to char** is said to "cast away const". Legal, but some compilers will warn if you use a cast to both cast away const and otherwise change the type at once.
The cast that doesn't break the promise of the const void* p1 declaration is the one used:
char* const* p1_arg = (char* const*) p1;
Now *p1_arg, the thing p1 points to, can't be changed just like we said. You could change the characters in it though:
*p1_arg[0] = 'x';
The function declaration never said anything about them, and you say you know them to originally be non-const chars. So it's allowable, even though the function doesn't actually do any such thing.
Then you dereference that (as an rvalue) to get a char*. That can legally be passed as the const char* argument to strcmp by automatic const promotion.
Technically, if you wanted to get rid of the consts, the cast would be to char **, not char *. The const is left in the cast because the arguments to cmpstringp are also const.
A comparison function passed to qsort has no business modifying the items it's comparing.
This is why the general case of qsort looks like:
void qsort(void *base, size_t nmemb, size_t size, int(*compar)(const void *, const void *));
Related
void qsort(void *base, size_t nitems, size_t size, int (*compar)(const void *, const void*))
Is there a way to pass, let's say strcmp to qsort without making a helper function?
I was trying to do:
qsort(..., (int (*) (const void*, const void*) (strcmp)));
Your attempt at the cast simply has a misplaced right (closing) parenthesis. The one at the end should be after the type of the cast. So, you can change:
(int (*) (const void*, const void*) (strcmp))
// ^ wrong
to
(int (*) (const void*, const void*)) (strcmp)
// ^ right
Alternatively, although hiding pointer types in typedef aliases is severely frowned-upon, function pointer types are an exception to that guideline. So, it is easier/clearer to define the required type for the qsort comparator first:
typedef int (*QfnCast) (const void*, const void*);
Then, you can cast to that type:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef int (*QfnCast) (const void*, const void*);
int main(void)
{
char list[5][8] = {
"Fred",
"Bob",
"Anna",
"Gareth",
"Joe"
};
qsort(list, 5, 8, (QfnCast)(strcmp));
for (int i = 0; i < 5; ++i) printf("%s\n", list[i]);
return 0;
}
int (*)(const void*, const void*) and int (*)(const char*, const char*) are not compatible function pointer types.
Casting between different, non-compatible function pointer types is explicitly undefined behavior, C17 6.3.2.3/8 emphasis mine:
A pointer to a function of one type may be converted to a pointer to a function of another
type and back again; the result shall compare equal to the original pointer. If a converted pointer is used to call a function whose type is not compatible with the referenced type, the behavior is undefined.
So if you cast strcmp to something else, you are explicitly invoking undefined behavior. It will likely work in practice on any system where all pointer types are of equal size. But if you are going to rely on that, you might as well cook up something like this:
typedef union
{
int (*strcmp) (const char*, const char*);
int (*compare)(const void*, const void*);
} strcmp_t;
const strcmp_t hack = { strcmp };
...
qsort(str, x, y, hack.compare);
This is just as undefined behavior (and as likely to work in practice) but more readable.
You can never do qsort(str, x, y, strcmp) because again strcmp is not compatible with the function pointer type expected by qsort. Function parameter passing is done as per assignment, so the rules of simple assignment are the relevant part, from C17 6.5.11:
Constratints
...
the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right;
Therefore qsort(str, x, y, strcmp) is always invalid C and this is not a quality of implementation issue. Rather, compilers letting this through without diagnostics are to be regarded as hopelessly broken.
And finally as noted in comments, strcmp only makes sense to use with bsearch/qsort in case you have a true 2D array of characters such as char str[x][y];. In my experience that's a rather rare use-case. When dealing with strings, you are far more likely to have char* str[x], in which case you must write a wrapper around strcmp anyway.
There are two problems with what you're trying to do.
First, strcmp has type int (*)(const char *, const char *). This type is incompatible with the type int (*)(const void*, const void*) expected by the function because the parameter types are not compatible. This will result in qsort calling strcmp via an incompatible pointer type, and doing so triggers undefined behavior.
This might work if char * and void * have the same representation, but there's no guarantee this will be the case.
The second problem is that even if the call "works", what's ultimately being passed to strcmp isn't actually a char * but a char **. This means that strcmp will be attempting to read a char * value as if it were a sequence of char values.
So you have to use a helper function to get the results you want:
int compare(const void *a, const void *b)
{
const char **s1 = a;
const char **s2 = b;
return strcmp(*a, *b);
}
as #some programmer dude has already stated, it depends on what you're sorting. If it's an array of strings, you can use strcmp without a helper function and do a cast to avoid ugly warnings:
char s_array[100][100] = { "z", "a", ... };
qsort( s_array, 100, 100, (int (*)(const void *, const void *))strcmp );
If it's an array of pointers you need a helper function because it gets passed pointers to pointers:
char *p_array[100] = { "z", "a", ... };
int cmp( const void *p1, const void *p2 )
{
return strcmp( *(const char **)p1, *(const char **)p2 );
}
qsort( p_array, 100, sizeof *p_array, cmp );
I'm taking a specialization on Coursera and in a lesson it explains the qsort() function that sorts a given array:
void qsort(void *base, size_t nmemb, size_t size, int (*compar)(const void *, const void *));
where we should provide qsort() with four parameters - the array to sort, number of elements in the array, size of each element of the array, and a pointer to a function (compar) which takes two const void *s and returns an int. The lesson says that we need to write the compar function to be compatible with the qsort function, so if we would like to compare two strings the function should look like:
int compareStrings(const void * s1vp, const void * s2vp) {
// first const: s1vp actually points at (const char *)
// second const: cannot change *s1vp (is a const void *)
const char * const * s1ptr = s1vp;
const char * const * s2ptr = s2vp;
return strcmp(*s1ptr, *s2ptr);
}
void sortStringArray(const char ** array, size_t nelements) {
qsort(array, nelements, sizeof(const char *), compareStrings);
}
It says: Note that the pointers passed in are pointers to the elements in the array (that is, they point at the boxes in the array), even though those elements are themselves pointers (since they are strings). When we convert them from void *s, we must take care to convert them to the correct type—here, const char * const *—and use them appropriately, or our function will be broken in some way. For example, consider the following broken code:
// BROKEN DO NOT DO THIS!
int compareStrings(const void * s1vp, const void * s2vp) {
const char * s1 = s1vp;
const char * s2 = s2vp;
return strcmp(s1, s2);
}
The thing that I can't really get is why didn't we consider s1vp and s2vp as pointers to pointers? I mean, since the arguments passed to the function compareStrings are addresses of pointers pointing to strings (address of pointer), shouldn't we have declared s1vp and s2vp as int compareStrings(const void ** s1vp, const void ** s2vp) since they are receiving addresses of pointers?
In other words, I'm passing, for example, the address of the first element of the array of strings, which is actually a pointer, to s1vp. So now s1vp is receiving address of pointer not a variable, so We should declare it as pointer to pointer, right? It gives me warning when I try to do so...
A void * can point to any datatype. The fact that the datatype in question is also a pointer doesn't change things.
Also, you can't change the signature of the comparison function, otherwise it would be incompatible with what qsort is expecting and can lead to undefined behavior.
I'm very much confused about the const keyword. I have a function accepting an array of strings as input parameter and a function accepting a variable number of arguments.
void dtree_joinpaths(char* output_buffer, int count, ...);
void dtree_joinpaths_a(char* output_buffer, int count, const char** paths);
dtree_joinpaths internally invokes dtree_joinpaths_a after it has built an array of strings from the argument list.
void dtree_joinpaths(char* output_buffer, int count, ...) {
int i;
va_list arg_list;
va_start(arg_list, count);
char** paths = malloc(sizeof(char*) * count);
for (i=0; i < count; i++) {
paths[i] = va_arg(arg_list, char*);
}
va_end(arg_list);
dtree_joinpaths_a(output_buffer, count, paths);
}
But the gcc compiler gives me the following error message:
src/dtree_path.c: In function 'dtree_joinpaths':
src/dtree_path.c:65: warning: passing argument 3 of 'dtree_joinpaths_a' from incompatible pointer type
When I change char** paths = malloc(count); to const char** paths = malloc(count);, this error is not showing up anymore. What I don't understand is, that
I thought a pointer to an address can always be casted to a const pointer, but not the other way round (which is what is happening here imo).
This example works: http://codepad.org/mcPCMk3f
What am I doing wrong, or where is my missunderstanding?
Edit
My intent is to make the memory of the input data immutable for the function. (in this case the paths parameter).
The reason char ** -> const char** is a "dangerous" conversion is the following code:
const char immutable[] = "don't modify this";
void get_immutable_str(const char **p) {
*p = immutable;
return;
}
int main() {
char *ptr;
get_immutable_str(&ptr); // <--- here is the dangerous conversion
ptr[0] = 0;
}
The above code attempts to modify a non-modifiable object (the global array of const char), which is undefined behavior. There is no other candidate in this code for something to define as "bad", so const-safety dictates that the pointer conversion is bad.
C does not forbid the conversion, but gcc warns you that it's bad. FYI, C++ does forbid the conversion, it has stricter const-safety than C.
I would have used a string literal for the example, except that string literals in C are "dangerous" to begin with -- you're not allowed to modify them but they have type array-of-char rather than array-of-const char. This is for historical reasons.
I thought a pointer to an address can always be casted to a const pointer
A pointer-to-non-const-T can be converted to a pointer-to-const-T. char ** -> const char** isn't an example of that pattern, because if T is char * then const T is char * const, not const char * (at this point it's probably worthwhile not writing the const on the left any more: write char const * and you won't expect it to be the same as T const where T is char *).
You can safely convert char ** to char * const *, and (for reasons that require a little more than just the simple rule) you can safely convert char ** to char const * const *.
The key is that not the pointer is const. To declare a const pointer, use char *const ptr; or to declare a const pointer to a const pointer, char *const *const ptr;. const char **ptr is a pointer to pointer to const char.
Actually if there is a function that accepts a const char** and you pass a char** , this can lead to a problematic situation and viceversa.
In your specific case you expect that the memory is immutable, but it's not immutable and may change at any time. In a multithreading environment you would expect this memory to be thread safe, and as long as it's living in the stack or heap, you wouldn't need a mutex to access to it.
All this is oriented to avoiding errors, but if you are sure that this wouldn't lead to an error you can simply cast the pointer to const char** .
You cannot pass char ** into const char ** because the compiler cannot guarantee const correctness.
Suppose you had the following code (and it compiled):
void foo(const char **ppc, const char* pc)
{
*ppc = pc; // Assign const char* to const char*
}
void bar()
{
const char c = 'x';
char* pc;
foo(&pc, &c); // Illegal; converting const char* to const char**. Will set p == &c
*pc = 'X'; // Ooops! That changed c.
}
See here for the same example without the function calls.
Through trial and error I managed to get the following string comparison function to work with qsort() as I intended but I don't really understand why the asterisk is needed in the (const char*) cast expression. Can someone please dissect and explain:-
int strCompare(const void *a, const void *b) {
return strcmp((const char*)a, (const char*)b);
}
Appendix:-
void findStrings(int * optionStats, char strings[][MAX_STRING_SIZE + 1], int numStrings)
{
qsort(strings, numStrings, 21*sizeof(char), strCompare);
}
Is there a way of eliminating the call to strcmp() through strCompare() and just using strcmp() as the parameter to qsort() instead?
You need an asterisk because you want to convert a pointer to const void to a pointer to const char and an asterisk designates that they are pointer types.
In fact you don't really need conversion, since pointer to void type can be implicitly converted to pointer to T type in C language, which isn't the case for C++.
As it's been mentioned by others here, you don't need to define a new function, just to cast the pointer types. Here's how you can cast the function while passing it to qsort, preventing any warning/error:
qsort(arr,
sizeof(arr)/sizeof(char*),
sizeof(char*),
(int(*)(const void *, const void *))strcmp);
The signature of strcmp is (there's another one, but this is the one you are using):
int strcmp(const char *s1, const char *s2);
so, as the parameters of your function (a and b) are const void, you have to perform those casts.
This will be correct as long as the variables you are using as parameters when calling qsort will be passed to strCompare as char *.
Because
int strcmp(
const char *string1,
const char *string2
);
is defined like that. If you do not cast it as " const char* ", the variable " a " is supposed to be of type pointer to void. Its better to understand if you type
const void *a as const void* a
the asterisk is associated with the data-type.
So to cast the whole variable "a" as a pointer to a data-type of " const char ", you have to use asterisk too.
So it seems like it means a pointer to a constant pointer to char. That is it points to a char * const, so far so good.
What gets me confused is where and how I saw it used. I was looking at the man page for qsort and the example does the following to convert the pointers to elements of a char ** (an array of strings), (pointers to elements seen as const void *) to normal char pointers feedable to strcmp:
static int
cmpstringp(const void *p1, const void *p2)
{
/* The actual arguments to this function are "pointers to
pointers to char", but strcmp(3) arguments are "pointers
to char", hence the following cast plus dereference */
return strcmp(* (char * const *) p1, * (char * const *) p2);
}
My question is, why is there a cast to char * const *? Why isn't it just a const char ** (because eventually we want to send a const char * to strcmp)?
char * const * indeed means a pointer to a constant pointer to chars. The reason this cast is performed in the code in the question is the following:
p1 and p2 are (non-const) pointers to a constant location. Let's assume the type of this location is const T.
Now we want to cast p1 and p2 to their real types. We know that each element of the array is a char *, therefore T = char *. That is const T is a constant pointer to char, which is written as char * const.
Since p1 and p2 are pointers to the elements of the array, they are of type const T *, which is char * const *.
Since the function merely calls strcmp, in truth it wouldn't have made any difference if the parameters were cast to char ** or const char ** or const char * const * or whatever.
When a function declares that it takes a pointer-to-const elements (e.g. strcmp()) it means that the function promises not to mody the elements via the pointer, it does not mean that the parameters passed to that function must be pointers-to-const themselves.
Remember: the const modifier is a contract term, basically meaning that the declaring function promises not to modify the element the const modifies. Conversion in the direction of non-const -> const therefore is usually OK.