Today I was told, that this code:
int main(){
char *a;
a = "foobar";
/* a used later to strcpy */
return 0;
}
Is bad, and may lead to problems and errors.
However, my code worked without any problems, and I don't understand, what is the difference between this, and
int main(){
char *a = "foobar";
/* a used later to strcpy */
return 0;
}
Which was described to me as the "correct" way.
Could someone describe, why these two codes are different?
And, if the first one may be problematic, show an example of this?
Functionally, they are the same.
In the former snippet, a is assigned to a string literal; in the latter, a is initialized with a string literal.
In both cases, a points to string literal (which can't be modified).
There's no reason to consider one as more correct than the other. I'd prefer the latter - but that's just my personal preference.
Both snippets are equally bad, because both end in a non const pointer pointing to const data. If the non const pointer is used to (try to) change the data, you will get Undefined Behaviour: everything can happen from it works to program crashes including modifying instruction is ignored.
The correct way is to either use a const pointer or to initialize a non const array.
const char *a = "foobar";
or
char a[] = "foobar";
But beware, in latter case you have a true array not a pointer, so you could also do if you really need pointer semantics:
char _a[] = "foobar";
char *a = _a;
There are some places that have coding standards, for instance to help with static code analysis by tools like Coverity.
A coding practice rule that I have seen several places is that variables should always be declared initialized to simplify things to make analysis easier.
You second snippet hews more closely to that rule than the first, as its impossible to insert new code where a could be used uninitialized.
That's a positive benefit when it comes to code maintenance.
Related
I've always used string constants in C as one of the following
char *filename = "foo.txt";
const char *s = "bar"; /* preferably this or the next one */
const char * const s3 = "baz":
But, after reading this, now I'm wondering, should I be declaring my string constants as
const char s4[] = "bux";
?
Please note that linked question suggested as a duplicate is different because this one is specifically asking about constant strings. I know how the types are different and how they are stored. The array version in that question is not const-qualified. This was a simple question as to whether I should use constant array for constant strings vs. the pointer version I had been using. The answers here have answered my question, when two days of searching on SO and Google did not yield an exact answer. Thanks to these answers, I've learned that the compiler can do special things when the array is marked const, and there are indeed (at least one) case where I will now be using the array version.
Pointer and arrays are different. Defining string constants as pointers or arrays fits different purposes.
When you define a global string constant that is not subject to change, I would recommend you make it a const array:
const char product_name[] = "The program version 3";
Defining it as const char *product_name = "The program version 3"; actually defines 2 objects: the string constant itself, which will reside in a constant segment, and the pointer which can be changed to point to another string or set to NULL.
Conversely, defining a string constant as a local variable would be better done as a local pointer variable of type const char *, initialized with the address of a string constant:
int main() {
const char *s1 = "world";
printf("Hello %s\n", s1);
return 0;
}
If you define this one as an array, depending on the compiler and usage inside the function, the code will make space for the array on the stack and initialize it by copying the string constant into it, a more costly operation for long strings.
Note also that const char const *s3 = "baz"; is a redundant form of const char *s3 = "baz";. It is different from const char * const s3 = "baz"; which defines a constant pointer to a constant array of characters.
Finally, string constants are immutable and as such should have type const char []. The C Standard purposely allows programmers to store their addresses into non const pointers as in char *s2 = "hello"; to avoid producing warnings for legacy code. In new code, it is highly advisable to always use const char * pointers to manipulate string constants. This may force you to declare function arguments as const char * when the function does not change the string contents. This process is known as constification and avoid subtile bugs.
Note that some functions violate this const propagation: strchr() does not modify the string received, declared as const char *, but returns a char *. It is therefore possible to store a pointer to a string constant into a plain char * pointer this way:
char *p = strchr("Hello World\n", 'H');
This problem is solved in C++ via overloading. C programmers must deal with this as a shortcoming. An even more annoying situation is that of strtol() where the address of a char * is passed and a cast is required to preserve proper constness.
The linked article explores a small artificial situation, and the difference demonstrated vanishes if you insert const after * in const char *ptr = "Lorum ipsum"; (tested in Apple LLVM 10.0.0 with clang-1000.11.45.5).
The fact the compiler had to load ptr arose entirely from the fact it could be changed in some other module not visible to the compiler. Making the pointer const eliminates that, and the compiler can prepare the address of the string directly, without loading the pointer.
If you are going to declare a pointer to a string and never change the pointer, then declare it as static const char * const ptr = "string";, and the compiler can happily provide the address of the string whenever the value of ptr is used. It does not need to actually load the contents of ptr from memory, since it can never change and will be known to point to wherever the compiler chooses to store the string. This is then the same as static const char array[] = "string";—whenever the address of the array is needed, the compiler can provide it from its knowledge of where it chose to store the array.
Furthermore, with the static specifier, ptr cannot be known outside the translation unit (the file being compiled), so the compiler can remove it during optimization (as long as you have not taken its address, perhaps when passing it to another routine outside the translation unit). The result should be no differences between the pointer method and the array method.
Rule of thumb: Tell the compiler as much as you know about stuff: If it will never change, mark it const. If it is local to the current module, mark it static. The more information the compiler has, the more it can optimize.
From the performance perspective, this is a fairly small optimization which makes sense for low-level code that needs to run with the lowest possible latency.
However, I would argue that const char s3[] = "bux"; is better from the semantic perspective, because the type of the right hand side is closer to type of the left hand side. For that reason, I think it makes sense to declare string constants with the array syntax.
The question may seem elementary but it's about a choice.
I have seen these in same code base and got a little confused over the approach.
We do an empty string initialisation so we don't get dereference error (I think).
static char *name=(char *)"";
and majority cases I have seen this
static char *name=NULL;
and on rare occasions have seen this too.
static char *name=(char *) 0;
Can we call any of these practices as standard and is universally recommended? If yes on what logic?
The two versions with the cast are non-standard; the cast is not necessary in either case (though there is more justification for it in the second of them).
The initializations do two different jobs — both are good jobs, but they are different.
static char *name = "";
Here, name is a valid pointer to an empty string. It can be passed to functions that expect a valid pointer but where an empty string is sensible.
static char *name = NULL;
static char *name = 0;
In both these cases, name is a null pointer — it doesn't point to anywhere valid. Dereferencing name after either of these initializations leads to undefined behaviour, which means your program typically (but not necessarily) crashes. You need to make the variable point at something valid before dereferencing it.
If an input const string is being modified in some way (which is resulting in C compiler warning), what is the best way to handle it - typecasting it to a new variable and then using it OR duplicating it and using it and then freeing it. Or is there any other way to handle this type of scenario. please suggest. Any help would be appreciated.
//Typecasting
const char * s1;
char * s2 = (char *)s1;
//Duplicate and free
const char * s1;
char * s2 = strdup( s1 );
free(s2)
EDIT: It is a C compiler; not C++. I am not sure whether in typecasting, s2 will be a new copy of string s1 or will it be pointing to the original string s1?
Thanks for the answers. I have one more doubt-
const char * c1;
const char * c2 = c1;
Is the above assignment valid?
Casting away const here might shut the compiler up but will lead to runtime failures. Make a copy of the string and work on that.
Casting away const does not copy the contents of the memory. It just creates a pointer to the same memory and tells the compiler that it can go right ahead and write to that memory. If the memory is read only you have a protection fault. More seriously you can have correctness problems which are hard to debug. Don't cast away const.
Of course, if you need to modify a variable and have those modifications visible to the caller, then you should not make it const in the first place. On the other hand, if the modifications are meant to be private to the function, then duplication of the const parameter is best.
As a broad rule you should attempt to avoid casts if at all possible. Casts are one of the most frequent sources of errors.
If the array that s1 points to is const then you must not modify it; doing so gives undefined behaviour. In that case, if you need to modify it, then you will have to take a copy. UPDATE: Typecasting will give you a pointer to the same array, which is still const, so you still must not modify (even though trying to do so no longer gives a compile error).
If you can guarantee that the array is not const, then you could modify it by casting away the const from the pointer. In that case, it would be less error-prone if you can enforce that guarantee by not adding a const qualifier to s1 in the first place.
If your input is a const char* you definitely should listen to the compiler warning and leave it alone.
I'm wondering whether static constant variables are thread-safe or not?
Example code snippet:
void foo(int n)
{
static const char *a[] = {"foo","bar","egg","spam"};
if( ... ) {
...
}
}
Any variable that is never modified, whether or not it's explicitly declared as const, is inherently thread-safe.
const is not a guarantee from the compiler that a variable is immutable. const is a promise that you make to the compiler that a variable will never be modified. If you go back on that promise, the compiler will generate an error pointing that out to you, but you can always silence the compiler by casting away constness.
To be really safe you should do
static char const*const a[]
this inhibits modification of the data and all the pointers in the table to be modified.
BTW, I prefer to write the const after the typename such that it is clear at a first glance to where the const applies, namely to the left of it.
In your example the pointer itself can be considered as thread safe. It will be initialized once and won't be modified later.
However, the content of the memory pointed won't be thread-safe at all.
In this example, a is not const. It's an array of pointers to const strings. If you want to make a itself const, you need:
static const char *const a[] = {"foo","bar","egg","spam"};
Regardless of whether it's const or not, it's always safe to read data from multiple threads if you do not write to it from any of them.
As a side note, it's usually a bad idea to declare arrays of pointers to constant strings, especially in code that might be used in shared libraries, because it results in lots of relocations and the data cannot be located in actual constant sections. A much better technique is:
static const char a[][5] = {"foo","bar","egg","spam"};
where 5 has been chosen such that all your strings fit. If the strings are variable in length and you don't need to access them quickly (for example if they're error messages for a function like strerror to return) then storing them like this is the most efficient:
static const char a[] = "foo\0bar\0egg\0spam\0";
and you can access the nth string with:
const char *s;
for (i=0, s=a; i<n && *s; s+=strlen(s)+1);
return s;
Note that the final \0 is important. It causes the string to have two 0 bytes at the end, thus stopping the loop if n is out of bounds. Alternatively you could bounds-check n ahead of time.
static const char *a[] = {"foo","bar","egg","spam"};
In C that would be always thread safe: the sructures would be already created at compile time, thus no extra action is taken at run time, thus no race condition is possible.
Beware the C++ compatibility though. Static const object would be initialized on the first entry into the function, but the initialization is not guaranteed to be thread-safe by the language. IOW this is open to a race condition when two different threads come into the function simultaneously and try to initialize the object in parallel.
But even in C++, POD (plain old data: structures not using C++ features, like in your example) would behave in the C compatible way.
char *strtok(char *s1, const char *s2)
repeated calls to this function break string s1 into "tokens"--that is
the string is broken into substrings,
each terminating with a '\0', where
the '\0' replaces any characters
contained in string s2. The first call
uses the string to be tokenized as s1;
subsequent calls use NULL as the first
argument. A pointer to the beginning
of the current token is returned; NULL
is returned if there are no more
tokens.
Hi,
I have been trying to use strtok just now and found out that if I pass in a char* into s1, I get a segmentation fault. If I pass in a char[], strtok works fine.
Why is this?
I googled around and the reason seems to be something about how char* is read only and char[] is writeable. A more thorough explanation would be much appreciated.
What did you initialize the char * to?
If something like
char *text = "foobar";
then you have a pointer to some read-only characters
For
char text[7] = "foobar";
then you have a seven element array of characters that you can do what you like with.
strtok writes into the string you give it - overwriting the separator character with null and keeping a pointer to the rest of the string.
Hence, if you pass it a read-only string, it will attempt to write to it, and you get a segfault.
Also, becasue strtok keeps a reference to the rest of the string, it's not reeentrant - you can use it only on one string at a time. It's best avoided, really - consider strsep(3) instead - see, for example, here: http://www.rt.com/man/strsep.3.html (although that still writes into the string so has the same read-only/segfault issue)
An important point that's inferred but not stated explicitly:
Based on your question, I'm guessing that you're fairly new to programming in C, so I'd like to explain a little more about your situation. Forgive me if I'm mistaken; C can be hard to learn mostly because of subtle misunderstanding in underlying mechanisms so I like to make things as plain as possible.
As you know, when you write out your C program the compiler pre-creates everything for you based on the syntax. When you declare a variable anywhere in your code, e.g.:
int x = 0;
The compiler reads this line of text and says to itself: OK, I need to replace all occurrences in the current code scope of x with a constant reference to a region of memory I've allocated to hold an integer.
When your program is run, this line leads to a new action: I need to set the region of memory that x references to int value 0.
Note the subtle difference here: the memory location that reference point x holds is constant (and cannot be changed). However, the value that x points can be changed. You do it in your code through assignment, e.g. x = 15;. Also note that the single line of code actually amounts to two separate commands to the compiler.
When you have a statement like:
char *name = "Tom";
The compiler's process is like this: OK, I need to replace all occurrences in the current code scope of name with a constant reference to a region of memory I've allocated to hold a char pointer value. And it does so.
But there's that second step, which amounts to this: I need to create a constant array of characters which holds the values 'T', 'o', 'm', and NULL. Then I need to replace the part of the code which says "Tom" with the memory address of that constant string.
When your program is run, the final step occurs: setting the pointer to char's value (which isn't constant) to the memory address of that automatically created string (which is constant).
So a char * is not read-only. Only a const char * is read-only. But your problem in this case isn't that char *s are read-only, it's that your pointer references a read-only regions of memory.
I bring all this up because understanding this issue is the barrier between you looking at the definition of that function from the library and understanding the issue yourself versus having to ask us. And I've somewhat simplified some of the details in the hopes of making the issue more understandable.
I hope this was helpful. ;)
I blame the C standard.
char *s = "abc";
could have been defined to give the same error as
const char *cs = "abc";
char *s = cs;
on grounds that string literals are unmodifiable. But it wasn't, it was defined to compile. Go figure. [Edit: Mike B has gone figured - "const" didn't exist at all in K&R C. ISO C, plus every version of C and C++ since, has wanted to be backward-compatible. So it has to be valid.]
If it had been defined to give an error, then you couldn't have got as far as the segfault, because strtok's first parameter is char*, so the compiler would have prevented you passing in the pointer generated from the literal.
It may be of interest that there was at one time a plan in C++ for this to be deprecated (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/1996/N0896.asc). But 12 years later I can't persuade either gcc or g++ to give me any kind of warning for assigning a literal to non-const char*, so it isn't all that loudly deprecated.
[Edit: aha: -Wwrite-strings, which isn't included in -Wall or -Wextra]
In brief:
char *s = "HAPPY DAY";
printf("\n %s ", s);
s = "NEW YEAR"; /* Valid */
printf("\n %s ", s);
s[0] = 'c'; /* Invalid */
If you look at your compiler documentation, odds are there is a option you can set to make those strings writable.