Confusion about const char * declarations in C - c

I know this may seem like a stupid question, but I am really having trouble understanding some particular examples of C code that involve a declaration of type const char *.
The example that got me thinking about it is in the answer at Parsing csv with c. In particular, the function defined as:
const char* getfield(char* line, int num)
{
const char* tok;
for (tok = strtok(line, ";");
tok && *tok;
tok = strtok(NULL, ";\n"))
{
if (!--num)
return tok;
}
return NULL;
}
My understanding of const char * declarations is that it defines a mutable pointer to an immutable array of characters. This array is assigned in the data section at compile time. However in this case (and many other code examples), the const char * declaration is not for a string literal. How can this memory be assigned for this at compile time if the actual value of the string is not known until the function executes?

this
const char* tok;
creates a pointer on the stack. It has an undefined value, it points nowhere
this
for (tok = strtok(line, ";");
then makes it point to somewhere in line (depends on contents)
your comments
My understanding of const char * declarations is that it defines a mutable pointer to an immutable array of characters. This array is assigned in the data section at compile time.
First sentence, almost, it creates a pointer that might point to an immutable array of chars. Like this
const char * msg = "hello world";
But is can also point nowhere
const char * msg = NULL;
Or to a mutable array of characters
char msg[] = "HelloWorld"; // mutable
const char *msgconst = msg;
the second sentence
This array is assigned in the data section at compile time
Yes, if this is a literal. Ie like this example again
const char*msg = "Hello World";
but the msgconst assignment is not like that, the characters are on the stack
Hope this helps
EDIT
The pointer points where it is told to do, making it const or not has zero effect on where the data is. The data is where it is.(!). You need to think about where the data is created, nothing to do with the pointer, the pointer can point at a literal, stack data, static data (mutable in data segment), heap.
Why use const
if the string is immutable then the pointer must be const. C allows you to assign a literal to a non const pointer for historical reasons, c++ does not
You want to make it clear that this string should not be changed. In your code case you really should not mess with the string strtok is processing. Although it returns char* , this is good practice
functions like strlen that do not change the string passed to them declare their arguments as const char*, its part of their contract: we dont change the string
But having the word const does not change anything about storage, and does not save any space anywhere

Related

(C Programming) Making a char ** array like argv

In a program I am writing I made a Tokenize struct that says:
TokenizerT *Tokenize(TokenizerT *str) {
TokenizerT *tok;
*tok->array = malloc(sizeof(TokenizerT));
char * arr = malloc(sizeof(50));
const char *s = str->input_strng;
int i = 0;
char *ds = malloc(strlen(s) + 1);
strcpy(ds, s);
*tok->array[i] = strtok(ds, " ");
while(*tok->array[i]) {
*tok->array[++i] = strtok(NULL, " ");
}
free(ds);
return tok;
}
where TokenizeT is defined as:
struct TokenizerT_ {
char * input_strng;
int count;
char **array[];
};
So what I am trying to do is create smaller tokens out of a large token that I already created. I had issues returning an array so I made array part of the TokenizerT struct so I can access it by doing tok->array. I am getting no errors when I build the program, but when I try to print the tokens I get issues.
TokenizerT *ans;
TokenizerT *a = Tokenize(tkstr);
char ** ab = a->array;
ans = TKCreate(ab[0]);
printf("%s", ans->input_strng);
TKCreate works because I use it to print argv but when i try to print ab it does not work. I figured it would be like argv so work as well. If someone can help me it would be greatl appreciated. Thank you.
Creating the Tokenizer
I'm going to go out on a limb, and guess that the intent of:
TokenizerT *tok;
*tok->array = malloc(sizeof(TokenizerT));
char * arr = malloc(sizeof(50));
was to dynamically allocate a single TokenizerT with the capacity to contain 49 strings and a NULL endmarker. arr is not used anywhere in the code, and tok is never given a value; it seems to make more sense if the values are each shifted one statement up, and corrected:
// Note: I use 'sizeof *tok' instead of naming the type because that's
// my style; it allows me to easily change the type of the variable
// being assigned to. I leave out the parentheses because
// that makes sure that I don't provide a type.
// Not everyone likes this convention, but it has worked pretty
// well for me over the years. If you prefer, you could just as
// well use sizeof(TokenizerT).
TokenizerT *tok = malloc(sizeof *tok);
// (See the third section of the answer for why this is not *tok->array)
tok->array = malloc(50 * sizeof *tok->array);
(tok->array is not a great name. I would have used tok->argv since you are apparently trying to produce an argument vector, and that's the conventional name for one. In that case, tok->count would probably be tok->argc, but I don't know what your intention for that member is since you never use it.)
Filling in the argument vector
strtok will overwrite (some) bytes in the character string it is given, so it is entirely correct to create a copy (here ds), and your code to do so is correct. But note that all of the pointers returned by strtok are pointers to character in the copy. So when you call free(ds), you free the storage occupied by all of those tokens, which means that your new freshly-created TokenizerT, which you are just about to return to an unsuspecting caller, is full of dangling pointers. So that will never do; you need to avoid freeing those strings until the argument vector is no longer needed.
But that leads to another problem: how will the string be freed? You don't save the value of ds, and it is possible that the first token returned by strtok does not start at the beginning of ds. (That will happen if the first character in the string is a space character.) And if you don't have a pointer to the very beginning of the allocated storage, you cannot free the storage.
The TokenizerT struct
char is a character (usually a byte). char* is a pointer to a character, which is usually (but not necessarily) a pointer to the beginning of a NUL-terminated string. char** is a pointer to a character pointer, which is usually (but not necessarily) the first character pointer in an array of character pointers.
So what is char** array[]? (Note the trailing []). "Obviously", it's an array of unspecified length of char**. Because the length of the array is not specified, it is an "incomplete type". Using an incomplete array type as the last element in a struct is allowed by modern C, but it requires you to know what you're doing. If you use sizeof(TokenizerT), you'll end up with the size of the struct without the incomplete type; that is, as though the size of the array had been 0 (although that's technically illegal).
At any rate, that wasn't what you wanted. What you wanted was a simple char**, which is the type of an argument vector. (It's not the same as char*[] but both of those pointers can be indexed by an integer i to return the ith string in the vector, so it's probably good enough.)
That's not all that's wrong with this code, but it's a good start at fixing it. Good luck.

Difference between char* and const char* [duplicate]

What's the difference between
char* name
which points to a constant string literal, and
const char* name
char* is a mutable pointer to a mutable character/string.
const char* is a mutable pointer to an immutable character/string. You cannot change the contents of the location(s) this pointer points to. Also, compilers are required to give error messages when you try to do so. For the same reason, conversion from const char * to char* is deprecated.
char* const is an immutable pointer (it cannot point to any other location) but the contents of location at which it points are mutable.
const char* const is an immutable pointer to an immutable character/string.
char *name
You can change the char to which name points, and also the char at which it points.
const char* name
You can change the char to which name points, but you cannot modify the char at which it points.
correction: You can change the pointer, but not the char to which name points to (https://msdn.microsoft.com/en-us/library/vstudio/whkd4k6a(v=vs.100).aspx, see "Examples"). In this case, the const specifier applies to char, not the asterisk.
According to the MSDN page and http://en.cppreference.com/w/cpp/language/declarations, the const before the * is part of the decl-specifier sequence, while the const after * is part of the declarator.
A declaration specifier sequence can be followed by multiple declarators, which is why const char * c1, c2 declares c1 as const char * and c2 as const char.
EDIT:
From the comments, your question seems to be asking about the difference between the two declarations when the pointer points to a string literal.
In that case, you should not modify the char to which name points, as it could result in Undefined Behavior.
String literals may be allocated in read only memory regions (implementation defined) and an user program should not modify it in anyway. Any attempt to do so results in Undefined Behavior.
So the only difference in that case (of usage with string literals) is that the second declaration gives you a slight advantage. Compilers will usually give you a warning in case you attempt to modify the string literal in the second case.
Online Sample Example:
#include <string.h>
int main()
{
char *str1 = "string Literal";
const char *str2 = "string Literal";
char source[] = "Sample string";
strcpy(str1,source); //No warning or error, just Undefined Behavior
strcpy(str2,source); //Compiler issues a warning
return 0;
}
Output:
cc1: warnings being treated as errors
prog.c: In function ‘main’:
prog.c:9: error: passing argument 1 of ‘strcpy’ discards qualifiers from pointer target type
Notice the compiler warns for the second case but not for the first.
char mystring[101] = "My sample string";
const char * constcharp = mystring; // (1)
char const * charconstp = mystring; // (2) the same as (1)
char * const charpconst = mystring; // (3)
constcharp++; // ok
charconstp++; // ok
charpconst++; // compile error
constcharp[3] = '\0'; // compile error
charconstp[3] = '\0'; // compile error
charpconst[3] = '\0'; // ok
// String literals
char * lcharp = "My string literal";
const char * lconstcharp = "My string literal";
lcharp[0] = 'X'; // Segmentation fault (crash) during run-time
lconstcharp[0] = 'X'; // compile error
// *not* a string literal
const char astr[101] = "My mutable string";
astr[0] = 'X'; // compile error
((char*)astr)[0] = 'X'; // ok
In neither case can you modify a string literal, regardless of whether the pointer to that string literal is declared as char * or const char *.
However, the difference is that if the pointer is const char * then the compiler must give a diagnostic if you attempt to modify the pointed-to value, but if the pointer is char * then it does not.
CASE 1:
char *str = "Hello";
str[0] = 'M' //Warning may be issued by compiler, and will cause segmentation fault upon running the programme
The above sets str to point to the literal value "Hello" which is hard-coded in the program's binary image, which is flagged as read-only in memory, means any change in this String literal is illegal and that would throw segmentation faults.
CASE 2:
const char *str = "Hello";
str[0] = 'M' //Compile time error
CASE 3:
char str[] = "Hello";
str[0] = 'M'; // legal and change the str = "Mello".
The question is what's the difference between
char *name
which points to a constant string literal, and
const char *cname
I.e. given
char *name = "foo";
and
const char *cname = "foo";
There is not much difference between the 2 and both can be seen as correct. Due to the long legacy of C code, the string literals have had a type of char[], not const char[], and there are lots of older code that likewise accept char * instead of const char *, even when they do not modify the arguments.
The principal difference of the 2 in general is that *cname or cname[n] will evaluate to lvalues of type const char, whereas *name or name[n] will evaluate to lvalues of type char, which are modifiable lvalues. A conforming compiler is required to produce a diagnostics message if target of the assignment is not a modifiable lvalue; it need not produce any warning on assignment to lvalues of type char:
name[0] = 'x'; // no diagnostics *needed*
cname[0] = 'x'; // a conforming compiler *must* produce a diagnostic message
The compiler is not required to stop the compilation in either case; it is enough that it produces a warning for the assignment to cname[0]. The resulting program is not a correct program. The behaviour of the construct is undefined. It may crash, or even worse, it might not crash, and might change the string literal in memory.
The first you can actually change if you want to, the second you can't. Read up about const correctness (there's some nice guides about the difference). There is also char const * name where you can't repoint it.
Actually, char* name is not a pointer to a constant, but a pointer to a variable. You might be talking about this other question.
What is the difference between char * const and const char *?
I would add here that the latest compilers, VS 2022 for instance, do not allow char* to be initialized with a string literal. char* ptr = "Hello"; throws an error whilst const char* ptr = "Hello"; is legal.

const char * VS char const * const (Not about what is const)

So, I know the differences between char const *, char * const, and char const * const. Those being:
char* the_string : I can change the char to which the_string points,
and I can modify the char at which it points.
const char* the_string : I can change the char to which the_string
points, but I cannot modify the char at which it points.
char* const the_string : I cannot change the char to which the_string
points, but I can modify the char at which it points.
const char* const the_string : I cannot change the char to which
the_string points, nor can I modify the char at which it points.
(from const char * const versus const char *?)
Now, my question is: Let's say I'm writing a function that would not modify the C string that is passed to it, for example:
int countA(??? string) {
int count = 0;
int i;
for (i=0; i<strlen(string); i++) {
if (string[i] == 'A') count++;
}
return count;
}
Now, what should the header be?
int countA(char const * string);
int countA(char const * const string);
My feeling is that I should use the second one, because I'm not going to modify the pointer itself, neither the contents of the array. But when I look to the header of standard functions they use the first one. Example
char * strcpy ( char * destination, const char * source );
Why?
(In fact char const * doesn't really make sense to me, because if you're thinking about the abstract string, either you are not modifying the string (so, char const * const because you are not modifying the pointer, neither the contents) or you will modify the string (so just char * because, you may modify the contents and you may need to allocate more memory, so you may need to modify the pointer)
I hope someone can make all this clear to me. Thanks.
In this case, it does not matter whether the pointer itself is const or not, because it is passed by-value anyway: Whatever strcpy does to source will not affect the caller's variable, because strcpy will operate on a copy of the caller's source on the stack, not the original. Note I am talking about the pointer value, not what the pointer points to, which should obviously not be changed since it is the source parameter.
char dest[10];
char const * source = "Hello";
strcpy( dest, source );
// no matter what strcpy does, the value of source will be unchanged
Within strcpy, you need to iterate pointers over the arrays pointed to by destination and source anyway. Not declaring the parameters as const allows the function to use the values from the stack directly, without copying / casting them first.
Having const char * represents a contract. It's a promise the function will not use that pointer to modify the contents passed by the user. Whether the pointer itself is constant or not is less valuable.
So, for the caller, whether the pointer itself is const or not makes no difference whatsoever.
In many implementations "string" functions regularly modify the passed pointer without modifying the contents. If the spec (the C standard) would mandate the pointer itself to be constant, it would be a limiting factor for all implementations, without providing any benefit for the caller.
As a side note, I think you shouldn't make everything const and then work your way around it. Just make stuff const if it feels the function shouldn't have reason to change it. In short, don't go const-mad, it's not a silver bullet and can be a recipe for frustration.
When you declare a function parameter as const char * const, it is not different to your callers from const char *: they could not care less what you do with that pointer, because to them it is all "pass by value" anyway.
The second const is there for you, not for your users. If you know that you are not going to modify that pointer, then by all means declare it const char * const: it is going to help you and others who maintain your code to catch errors later.
As for the standard library, my guess is that they did not want to make it const char * const because they wanted an ability to modify the poitners:
char * strcpy ( char * destination, const char * source ) {
char *res = destination;
while (*destination++ = *source++)
;
return res;
}
The non-defining declarations:
int countA(char const * string);
int countA(char const * const string);
int countA(char const * foobar);
int countA(char const *);
are all equivalent. The string parameter names (in effect) a local variable inside the implementation of countA. It's none of the caller's business whether the implementation modifies that variable or not, and it doesn't affect the signature of the function.
It is the caller's business whether the function modifies the referand of string, so the first const is important. It is slightly the caller's business what the variable is named, but only because the convention is to name parameters in declarations as a hint what they're used for. Documentation is the way to completely convey the meaning of each parameter, not its name.
If you want a rule: omit the second const, because it clutters the declaration while telling the caller nothing of any use.
You could include it in the function definition, but there are some problems with doing so. You either need to keep the header up to date with the definition (in which case you'll sometimes find yourself changing the header due to a change in implementation details that doesn't affect callers). Or else you have to accept that the header doesn't match the definition, which will occasionally upset people who see:
int countA(char const * const string) {
return 0;
}
and search the headers for int countA(char const * const string), or vice-versa see the header and search the source. They need a smarter search term.
char const *s : s is a pointer to const char.
char *const s : s is a constant pointer to char.
When s is a function parameter, the first notation is more useful than the second.
With char const *, you can't modify the pointed value.
With char *const, you can't modify the value of your pointer. It is like int const in function parameters : you can't do operations directly onto your parameter. It doesn't change anything for the call function. (now, it's almost useless for the compiler, but it's meaningful for programmers)

Accessing a C string from fixed Flash location

I have define a string at location 0xAABB:
const char str[] = "Hi There";
const word _Str1 #0xAABB = (word)str;
Now, I want to access this string located at 0xAABB.
What is the C syntax for this?
Just use a pointer to the memory location:
const char *str = (const char *)0xAABB;
Edit: (based on your comment) You need to first get the address from location 0xAABB and then use this value as the string's address:
const char *str = *(const char **)0xAABB;
This will dereference the pointer at 0xAABB to get the actual string address.
#define ADDRESS_OF_YOUR_STRING ((const char *)0xAABB)
const char *pointer_to_your_string = (const char *)0xAABB;
I could have used the macro in the pointer definition, but these are really intended to be separate demonstrations of how to do this. If you put the macro version in a header file and use it in your code the compiler will be able to optimize a little better. If you use a global pointer variable to access it then you may end up with the compiler needing to do an extra load of a pointer.
Using both of these is the same:
printf("%s\n", ADDRESS_OF_YOUR_STRING);
should work just fine since printf expects a pointer to a string for %s.

How do I properly turn a const char* returned from a function into a const char** in C?

In short, I would like to do this:
const char **stringPtr = &getString();
However, I understand that you can't & on rvalues. So I'm stuck with this:
const char *string = getString();
const char **stringPtr = &string;
I can live with two lines. Am I introducing problems with this hack? I should have no fear of passing stringPtr out of the function it is declared in, right?
Edit: My apologies for not originally including the full context.
I have taken on the summer project of building a video game from the ground up in C using OpenGL for graphics. I'm reading configuration data from a text file using libconfig.
One of the convenience functions for finding a specific string from your configuration file looks like this:
int config_setting_lookup_string(const config_setting_t *setting,
const char *name, const char **value)
{
config_setting_t *member = config_setting_get_member(setting, name);
if(! member)
return(CONFIG_FALSE);
if(config_setting_type(member) != CONFIG_TYPE_STRING)
return(CONFIG_FALSE);
*value = config_setting_get_string(member);
return(CONFIG_TRUE);
}
The way that value is assigned means that if you give the function an uninitialized value, it attempts to derefence undefined garbage, which pretty much always causes me a segfault. My current workaround for this issue is to initialize value to another pointer first, like so:
const char *dummyPtr;
const char **fileName = &dummyPtr;
config_setting_lookup_string(foo, "bar", fileName);
So I am trying to figure out the best way to rewrite the last part of the function so that I won't have to perform this two-step initialization. I was thinking that the changed function would look like this:
int config_setting_lookup_string(const config_setting_t *setting,
const char *name, const char **value)
{
config_setting_t *member = config_setting_get_member(setting, name);
if(! member)
return(CONFIG_FALSE);
if(config_setting_type(member) != CONFIG_TYPE_STRING)
return(CONFIG_FALSE);
const char *string = config_setting_get_string(member);
value = &string;
return(CONFIG_TRUE);
}
If you're calling a function that needs a const char**, you could do it like this:
const char *s = getString();
myFunction(&s);
Since s is allocated on the stack in the above example, if you want to return a const char** from your function, you will need to put it on the heap instead:
const char **sp = malloc(sizeof(const char *));
*sp = getString();
return sp;
HTH
string in your case is a local, so taking the address of it is a bad idea since the memory for the local can (and likely will be) re-used for other purposes when you leave the method. In general, it is not a good idea to use the address of a local variable outside of its scope.
What are you trying to achieve?
No, you can't change config_setting_lookup_string() in the way you've described. You're returning a pointer to the string variable, but as soon as that function ends that variable goes out of scope and is destroyed.
You can, however, fix your initial problem quite easily. Leave the definition of config_setting_lookup_string() as it is, and call it like so:
const char *fileName = NULL;
config_setting_lookup_string(foo, "bar", &fileName);
From the added information, it seems what you are trying to do is call a function which wants to return a string through one of the function arguments. The best way to do this in my opinion would be something like:
const char* fileName;
config_setting_lookup_string(..., &fileName);
(...)
return fileName;
This will allocate room for a const char* on the stack. The function call will fill the pointer with the address of the string it wants to return. This pointer value can then be passed out of the function if needed (unlike the pointer to the pointer, which would point to the stack, and be invalid when the function returns).
Note that initializing fileName with "getString()" would presumably leak memory, since the pointer to the returned string would be overwritten, and the string never deallocated.
You need the two lines. However, string is a local variable on the stack, once it goes out of scope, you may not have a pointer to the data returned by getString().
I like nornagon and caf's solution,
const char *fileName;
config_setting_lookup_string(foo, "bar", &fileName);
but if you can change config_setting_lookup_string you could also do it this way:
int config_setting_lookup_string(..., const char *&value)
{
...
const char *string = config_setting_get_string(member);
value = string;
...
}
const char *fileName;
config_setting_lookup_string(foo, "bar", fileName);
If you return stringPtr, you will be returning a pointer to a local variable (string). So no, you can't do that.
Why are you trying to do this? That might allow us to make better suggestions.
Update:
Okay, now I see what you're trying to do. You're doing it wrong:
value = &string;
If value is meant as an output parameter, the above line cannot work because you're assigning to a local variable.
Don't let the extra level of indirection confuse you. If you were writing a function that had an output parameter of type T, you'd write it as:
void foo(T* value)
{
*value = GetT();
}
Now replace T with const char*:
...
*value = string;
...
And now you're not involving any temporary, local variables. Of course, that's how the code was originally written (and that part of it was correct), so that doesn't really help you. To address your intent, you should:
Make config_setting_lookup_string do assert(value != NULL).
Audit the callers of the function and fix them to stop passing garbage. They should be doing:
const char* foo;
config_setting_lookup_string(..., &foo);
and NOT:
const char** foo;
config_setting_lookup_string(..., foo);

Resources