How to implement strstr() without casting away const? - c

strstr is a C99-conforming function, the type signature of which is the following:
char *strstr(const char *haystack, const char *needle);
Is it possible to implement this function without casting away a const somewhere?
For reference, here is Apple's implementation, and here is GNU's implementation. Both cast away the const at the end.

You can't implement strstr() without violating const correctness somehow. A cast is the most straightforward way to do that. You can probably hide the violation somehow (for example you could use memcpy() to copy the pointer value), but there's no point in doing so.
The problem is that strstr() takes a const char* that points to a string, and returns a non-const char* that points into the same string.
For example, this program:
#include <stdio.h>
#include <string.h>
int main(void) {
const char s[] = "hello";
char *result = strstr(s, "hello");
*result = 'H';
puts(result);
}
modifies (or at least attempts to modify) a const-qualified object, without using a pointer cast or any other obviously unsafe construct.
Back in 1989, the ANSI C committee could have avoided this problem by defining two different functions, say:
const char *strcstr(const char *haystack, const char *needle);
char *strstr ( char *haystack, const char *needle);
one that returns a pointer to a const char given a const arguments, and another that returns pointer to a modifiable char given a modifiable argument. (C++, which inherits the C standard library, does this by overloading.)
strstr() is one of several standard string functions that have this problem.

What you are seeing in case of strstr (and some other standard functions) is an idiomatic solution, whose intent is to support non-modifying operations on both const and non-const strings with a single function.
Yes, in order to implement such a function it necessary to explicitly cast away constness from the input pointer. Note that the cast by itself does not yet physically violate anything (meaning: it does not attempt to modify constant data and does not cause undefined behavior). Yet, it does open the door to such violations by returning a non-const pointer to const data to the caller, i.e. violates the conceptual rules of const-correctness.
An alternative solution would be to provide two versions of the same standard function - a const one and a non-const one - but without C++-style function overloading this would require two distinctive function names, which does not necessarily look like a very good idea (and even with C++-style function overloading it has its issues too).
In this case it is effectively the responsibility of the caller to receive the result into a pointer of the same type as the one that was passed in as an argument. If the argument pointer was const-qualified, it is a very good programming practice to receive the result into a pointer that is const-qualified as well. As long as this guideline is observed, this idiom actually feels right at home in the realm of C language.

Related

Why are strings in C declared with 'const'?

For example, why not:
char *s= "example";
instead of:
const char *s= "example";
I understand that const makes it unchangeable, but why do I receive an error when compiling the first?
Additionally, how does the concept apply to
int * x;
vs
const int *x;
I see the second used a lot more, is it good practice to use "cons int *"?
There's no requirement to use const, but it's a good idea.
In C, a string literal is an expression of type char[N], where N is the length of the string plus 1 (for the terminating '\0' null character). But attempting to modify the array that corresponds to the string literal has undefined behavior. Many compilers arrange for that array to be stored in read-only memory (not physical ROM, but memory that's marked read-only by the operating system). (An array expression is, in most contexts converted to a pointer expression referring to the initial element of the array object.)
It would have made more sense to make string literals const, but the const keyword did not exist in old versions of C, and it would have broken existing code. (C++ did make string literals const).
This:
char *s= "example"; /* not recommended */
is actually perfectly valid in C, but it's potentially dangerous. If, after this declaration, you do:
s[0] = 'E';
then you're attempting to modify the string literal, and the behavior is undefined.
This:
const char *s= "example"; /* recommended */
is also valid; the char* value that results from evaluating the string literal is safely and quietly converted to const char*. And it's generally better than the first version because it lets the compiler warn you if you attempt to modify the string literal (it's better to catch errors at compile time than at run time).
If you get an error on your first example, then it's likely that you're inadvertently compiling your code as C++ rather than as C -- or that you're using gcc's -Wwrite-strings option or something similar. (-Wwrite-strings makes string literals const; it can improve safety, but it can also cause gcc to reject, or at least warn about, valid C code.)
With Visual Studio 2015 at warning level 4, this compiles and runs whether compiled as C or C++:
#include <stdio.h>
char *s1= "example\n";
const char *s2= "example\n";
int main(int argc, char **argv)
{
printf(s1); // prints "example"
s1[2] = 'x';
printf(s1); // prints "exxmple"
printf(s2);
return 0;
}
If I add this line, it will fail to compile as C or C++ with every compiler I know of:
s2[2] = 'x'; // produces compile error
This is the error the const keyword is designed to avoid. It simply tells the compiler not to allow assignments to the object pointed to.
It doesn't matter if your pointer points to char or int or anything else. The const keyword has the same effect on all pointers, and that's to make it impossible (well, very hard) to assign to the thing declared const.
A string literal used as a value compiles to an array of char that should not be modified. Attempting to modify it invokes undefined behavior. For historical reasons of backward compatibility, its type is char [] although is really should be const char []. You can enable extra compiler warnings to change this and instruct the compiler to consider such strings to be const.

When to use const void*?

I have this very simple test function that I'm using to figure out what's going on with the const qualifier.
int test(const int* dummy)
{
*dummy = 1;
return 0;
}
This one throws me an error with GCC 4.8.3.
Yet this one compiles:
int test(const int* dummy)
{
*(char*)dummy = 1;
return 0;
}
So it seems like the const qualifier works only if I use the argument without casting to another type.
Recently I've seen codes that used
test(const void* vpointer, ...)
At least for me, when I used void *, I tend to cast it to char for pointer arithmetic in stacks or for tracing. How can const void prevent subroutine functions from modifying the data at which vpointer is pointing?
const int *var;
const is a contract. By receiving a const int * parameter, you "tell" the caller that you (the called function) will not modify the objects the pointer points to.
Your second example explicitly breaks that contract by casting away the const qualifier and then modifying the object pointed by the received pointer. Never ever do this.
This "contract" is enforced by the compiler. *dummy = 1 won't compile. The cast is a way to bypass that, by telling the compiler that you really know what you are doing and to let you do it. Unfortunately the "I really know what I am doing" is usually not the case.
const can also be used by compiler to perform optimization it couldn't otherwise.
Undefined Behavior note:
Please note that while the cast itself is technically legal, modifying a value declared as const is Undefined Behavior. So technically, the original function is ok, as long as the pointer passed to it points to data declared mutable. Else it is Undefined Behavior.
more about this at the end of the post
As for motivation and use lets take the arguments of strcpy and memcpy functions:
char* strcpy( char* dest, const char* src );
void* memcpy( void* dest, const void* src, std::size_t count );
strcpy operates on char strings, memcpy operates on generic data. While I use strcpy as example, the following discussion is exactly the same for both, but with char * and const char * for strcpy and void * and const void * for memcpy:
dest is char * because in the buffer dest the function will put the copy. The function will modify the contents of this buffer, thus it is not const.
src is const char * because the function only reads the contents of the buffer src. It doesn't modify it.
Only by looking at the declaration of the function, a caller can assert all the above. By contract strcpy will not modify the content of the second buffer passed as argument.
const and void are orthogonal. That is all the discussion above about const applies to any type (int, char, void, ...)
void * is used in C for "generic" data.
Even more on Undefined Behavior:
Case 1:
int a = 24;
const int *cp_a = &a; // mutabale to const is perfectly legal. This is in effect
// a constant view (reference) into a mutable object
*(int *)cp_a = 10; // Legal, because the object referenced (a)
// is declared as mutable
Case 2:
const int cb = 42;
const int *cp_cb = &cb;
*(int *)cp_cb = 10; // Undefined Behavior.
// the write into a const object (cb here) is illegal.
I began with these examples because they are easier to understand. From here there is only one step to function arguments:
void foo(const int *cp) {
*(int *)cp = 10; // Legal in case 1. Undefined Behavior in case 2
}
Case 1:
int a = 0;
foo(&a); // the write inside foo is legal
Case 2:
int const b = 0;
foo(&b); // the write inside foo causes Undefined Behavior
Again I must emphasize: unless you really know what you are doing, and all the people working in the present and in the future on the code are experts and understand this, and you have a good motivation, unless all the above are met, never cast away the constness!!
int test(const int* dummy)
{
*(char*)dummy = 1;
return 0;
}
No, this does not work. Casting away constness (with truly const data) is undefined behavior and your program will likely crash if, for example, the implementation put const data in ROM. The fact that "it works" doesn't change the fact that your code is ill-formed.
At least for me, when I used void*, I tend to cast it to char* for
pointer arithmetic in stacks or for tracing. How can const void*
prevent subroutine functions from modifying the data at which vpointer
is pointing?
A const void* means a pointer to some data that cannot be changed. In order to read it, yes, you have to cast it to concrete types such as char. But I said reading, not writing, which, again, is UB.
This is covered more in depth here. C allows you to entirely bypass type-safety: it's your job to prevent that.
It’s possible that a given compiler on a given OS could put some of its const data in read-only memory pages. If so, attempting to write to that location would fail in hardware, such as causing a general protection fault.
The const qualifier just means that writing there is undefined behavior. This means the language standard allows the program to crash if you do (or anything else). Despite that, C lets you shoot yourself in the foot if you think you know what you’re doing.
You can’t stop a subroutine from reinterpreting the bits you give it however it wants and running any machine instruction on them it wants. The library function you’re calling might even be written in assembler. But doing that to a const pointer is undefined behavior, and you really don’t want to invoke undefined behavior.
Off the top of my head, one rare example where it might make sense: suppose you’ve got a library that passes around handle parameters. How does it generate and use them? Internally, they might be pointers to data structures. So that’s an application where you might typedef const void* my_handle; so the compiler will throw an error if your clients try to dereference it or do arithmetic on it by mistake, then cast it back to a pointer to your data structure inside your library functions. It’s not the safest implementation, and you want to be careful about attackers who can pass arbitrary values to your library, but it’s very low-overhead.

Why doesn't GLib use 'const' in these functions?

I have a simple question about GLib.
I have the following code:
static const char *words[] = { "one", "two", "three", NULL };
void main() {
puts(g_strjoinv("+", words));
}
This code prints one+two+three. It uses a GLib function which joins strings.
The signature of that function is:
char *g_strjoinv (const char *separator, char **str_array);
(To be exact, GLib uses gchar, not char, but let's ignore this.)
Now, I wonder why the parameter is char **str_array and not const char **str_array. It forces me to do an explicit cast to get rid of the compiler's warning ("expected 'char **' but argument is of type 'const char **'"):
puts(g_strjoinv("+", (char **)words));
I look in GLib's reference and I see that all the functions there are defined this way: they accept char **, not const char **.
Why is that? Why doesn't GLib use const char **?
The need to use an explicit cast to get rid of the const makes my code less safe (because the compiler no longer checks the arguments' compatibility). It also makes me nervous because GLib doesn't "sign a contract" that say it won't change my data.
The assumption in your question is that if the second argument of g_strjoinv() was declared as type const char **, then you can just as easily pass either a const char ** or a char ** to it.
Unfortunately, this isn't true. As explained in this question from the comp.lang.c FAQ, using a pointer to T (for any type T) where a pointer to const T is expected is OK, but only where the mismatch is at the top level of indirection. That is to say, you can pass a char * where a const char * is expected, but you cannot (without a cast) pass a char ** where a const char ** is expected, because in this case the mismatch is at the second level of indirection. The linked question explains in some detail why it works this way, which is "somewhat obscure".
So the upshot is there there is no type (in C) which will accept both a char ** and a const char ** without at least one of them requiring a cast. This being the case, there's probably not much reason to prefer either other than convenience, and the library authors apparently decided to go with the former. As a matter of pure speculation, I'd guess that wanting to pass a char ** is slightly more common than wanting to pass a const char **, and so the option they picked is probably more convenient.
In your case, you can just drop the const qualifier from the definition of your array. It might seem like bad form, if you like const, but since you have an array of pointers which point to string literals, your program is overwhelmingly likely to loudly complain and fail if anything attempts to write to them anyway, so you're not really losing safety in any meaningful way in this particular case. Even though it's undefined behavior to attempt to modify them, string literals have type array of char in C, not array of const char (unlike C++).
Slightly more obscure, but even if you could pass it as a const char **, the function here could still modify the pointers even if it couldn't modify what they point to, and just cause mayhem in a different way, so the promise the function made still wouldn't be guaranteeing that what you pass to the function wouldn't be changed in some way.

returning const char * from a function in C

In my library I have to return a string to the callers. The string I am returning will be a global array, the intended use from the caller is just to read the string. I don't want them to modify it..
Is this the right way to declare the function..
const char * get_some_details();
This should generate a warning (tried only gcc) either when the caller assigns the return value to a char * ptr or assigns to const char * ptr, but later tries to modify it.
I am asking because I would expect functions like getenv() to return const char *. But it returns char *. Is there any gotcha in returning const char * ?
Returning const char* is exactly the right thing to do in these circumstances.
Many older APIs don't use const since they pre-date the introduction of const in C90 (there was no const before then).
That is the right way to do it.
Many standard library functions take or return char * when logically it should be const char *, for historical reasons: pre-C89, there was no const; post-C89, their type signatures could not be changed without breaking application code. As a general rule, do not use the standard library as a style guide; it is very old and contains many things that are no longer considered good practice. (cough gets)

char *p="orkut" vs const char *p="orkut"

char *p="orkut" vs const char *p="orkut"
whats the difference btwn these two...
EDIT
from bjarne stroustrup,3rd edition page 90
void f()
{
char* p="plato";
p[4]='e' // error: assign to const;result is undefined
}
this kind of error cannont be general b caught until run time and implementations differ in their enforcement of this rule
Same with const char *p="plato"
Thats why iam asking the diffrence... Whats the significance of const here..
The const char* variant is correct.
You should not change memory that comes from a string literal (referred to as static storage usually). It is read only memory.
The difference is that the char* variant will allow you to write the syntax to change the data that it points to by dereferencing it. What it actually does though is undefined.
//Option 1:
char *p = "orkut";
*p = 'x';//undefined behavior
//Option 2:
const char *q = "orkut";
*q = 'x';//compiling error
I would rather have option 2 happen to me.
A declaration of const char * p means that the thing p points at is const, ie should not change. I say should not because it is possible to cast away the constness. As has been pointed out, changing a string literal is undefined and often leads to an access violation/segmentation fault.
Not this declaration is different than char * const p which means that p itself it const rather than the thing p points at.
The problem that Stroustrup discusses in what you're quoting is that in C++ a string literal will readily convert to "an rvlaue of type "pointer to char" (4.2/2 "Array-to-pointer conversion"). This is specifically so that the very common idiom of pointing a char* to a literal string wouldn't cause a bazillion programs to fail to compile. (especially when C++ was initially evolving from C).
If you can get away with declaring you pointer as char const* (or the equivalent const char*), you'll help yourself from running into problems like those described in the Stroustrup quote. However, you might well run into irritating problems using the pointer with functions that aren't 100% const correct.
See this question.
Basically,
char *p="orkut";
In this case, p is supposed to be read-only, but this is not enforced by all compilers (or the standard).
note that more recent gcc implementations will not let you do
char *foo = "foo";
they insist on the const (certainly in -wall -werror mode)

Resources