How Should I Define/Declare String Constants - c

I've always used string constants in C as one of the following
char *filename = "foo.txt";
const char *s = "bar"; /* preferably this or the next one */
const char * const s3 = "baz":
But, after reading this, now I'm wondering, should I be declaring my string constants as
const char s4[] = "bux";
?
Please note that linked question suggested as a duplicate is different because this one is specifically asking about constant strings. I know how the types are different and how they are stored. The array version in that question is not const-qualified. This was a simple question as to whether I should use constant array for constant strings vs. the pointer version I had been using. The answers here have answered my question, when two days of searching on SO and Google did not yield an exact answer. Thanks to these answers, I've learned that the compiler can do special things when the array is marked const, and there are indeed (at least one) case where I will now be using the array version.

Pointer and arrays are different. Defining string constants as pointers or arrays fits different purposes.
When you define a global string constant that is not subject to change, I would recommend you make it a const array:
const char product_name[] = "The program version 3";
Defining it as const char *product_name = "The program version 3"; actually defines 2 objects: the string constant itself, which will reside in a constant segment, and the pointer which can be changed to point to another string or set to NULL.
Conversely, defining a string constant as a local variable would be better done as a local pointer variable of type const char *, initialized with the address of a string constant:
int main() {
const char *s1 = "world";
printf("Hello %s\n", s1);
return 0;
}
If you define this one as an array, depending on the compiler and usage inside the function, the code will make space for the array on the stack and initialize it by copying the string constant into it, a more costly operation for long strings.
Note also that const char const *s3 = "baz"; is a redundant form of const char *s3 = "baz";. It is different from const char * const s3 = "baz"; which defines a constant pointer to a constant array of characters.
Finally, string constants are immutable and as such should have type const char []. The C Standard purposely allows programmers to store their addresses into non const pointers as in char *s2 = "hello"; to avoid producing warnings for legacy code. In new code, it is highly advisable to always use const char * pointers to manipulate string constants. This may force you to declare function arguments as const char * when the function does not change the string contents. This process is known as constification and avoid subtile bugs.
Note that some functions violate this const propagation: strchr() does not modify the string received, declared as const char *, but returns a char *. It is therefore possible to store a pointer to a string constant into a plain char * pointer this way:
char *p = strchr("Hello World\n", 'H');
This problem is solved in C++ via overloading. C programmers must deal with this as a shortcoming. An even more annoying situation is that of strtol() where the address of a char * is passed and a cast is required to preserve proper constness.

The linked article explores a small artificial situation, and the difference demonstrated vanishes if you insert const after * in const char *ptr = "Lorum ipsum"; (tested in Apple LLVM 10.0.0 with clang-1000.11.45.5).
The fact the compiler had to load ptr arose entirely from the fact it could be changed in some other module not visible to the compiler. Making the pointer const eliminates that, and the compiler can prepare the address of the string directly, without loading the pointer.
If you are going to declare a pointer to a string and never change the pointer, then declare it as static const char * const ptr = "string";, and the compiler can happily provide the address of the string whenever the value of ptr is used. It does not need to actually load the contents of ptr from memory, since it can never change and will be known to point to wherever the compiler chooses to store the string. This is then the same as static const char array[] = "string";—whenever the address of the array is needed, the compiler can provide it from its knowledge of where it chose to store the array.
Furthermore, with the static specifier, ptr cannot be known outside the translation unit (the file being compiled), so the compiler can remove it during optimization (as long as you have not taken its address, perhaps when passing it to another routine outside the translation unit). The result should be no differences between the pointer method and the array method.
Rule of thumb: Tell the compiler as much as you know about stuff: If it will never change, mark it const. If it is local to the current module, mark it static. The more information the compiler has, the more it can optimize.

From the performance perspective, this is a fairly small optimization which makes sense for low-level code that needs to run with the lowest possible latency.
However, I would argue that const char s3[] = "bux"; is better from the semantic perspective, because the type of the right hand side is closer to type of the left hand side. For that reason, I think it makes sense to declare string constants with the array syntax.

Related

where is the practical use for const char*,volatile const char*,char*const. example of applications will help understand better

const char*
volatile const char*
char *const
In c under what circumstance is it used? an example will help understand better.
const char * (and char const *, both are equivalent) are used to point to characters that are constant, i.e. the string can't be modified.
Example:
const char string1[] = "foo"; // Create a constant, unmodifiable string
const char string2[] = "bar";
const char *s = string1;
// s[0] = 'o'; ERROR: Attempting to modify the constant data
s = string2; // Okay, makes s point to another constant string
char * const make the pointer variable itself constant, the pointer can't be modified to point somewhere else. The string contents can be modified though.
Example:
char string1[] = "foo";
char string2[] = "bar";
char * const s = string1; // Initialize to point to string in string1
s[0] = 'o'; // Okay, string1 is now "ooo"
// s = string2; ERROR: Attempting to modify a constant variable
These const qualifiers can be combined:
const char string1[] = "foo";
const char string2[] = "bar";
const char * const s = string1; // Initialize to point to string in string1
// s[0] = 'o'; ERROR: Attempting to modify the constant data
// s = string2; ERROR: Attempting to modify a constant variable
As #Someprogrammerdude has covered the language implications of const qualifier I will focus on some implementation aspects
Embedded (especially microcontroller) programmers need to control where the data is placed. If you have a lookup table or data which does not change you want to put it into the FLASH memory and not waste the precious RAM.
Most (known to me) embedded implementations will place const data in the .rodata segment which usually is physically located in the non-volatile memory (mainly FLASH).
after edit:
const char* - pointer is not const, data referenced by the pointer is const
const char const * - syntax error - means nothing
const char * const - pointer is const, data referenced by the pointer is const
volatile const char* - pointer is not const, data referenced by the pointer is const but is side effect prone. It means that you cant change that data, but something can. As an example - the read-only hardware register mapped into the address space.
const *char - it is wrong syntax and it means nothing
there is no constant in C so I assume you mean const.
const char const * is the same as constant char * so it makes no sense to repeat the const
const *char is not legal C syntax
Anyway - your main question seeems to be
Why and when should I use const in my code
There are several reasons. Using const gives your compiler some extra information about what you want to do. The compiler can make a number of decisions based on that information like doing some optimization, deciding where in memory to place your object and even catching programming bugs.
Another benefit is that you can provide the same information to users of your code. For instance if you write a function that I'm going to use and do it like:
void foo(char * string);
I can't know whether foo will change the contents of string. I'll have to read some additional documentation to find out.
But if you do
void foo(const char * string);
I know that your function foo will not change the string that I pass to it. Further, I would know that it's safe to pass a string literal to that function. In other words - by using const you have given me some extra information in the function proto type and thereby made it easier to use your code.
Another benefit is that it can help you catch mistakes. Let's say you design the function foo so that it shouldn't modify string. One year later you (or a co-worker) need to make some updates to the function. Meanwhile you have forgotten (or your co-worker didn't realize) that foo shouldn't change string and therefore do stuff like string[8] = 'A'. If you originally declared foo using const, the compiler will issue an error for that code and thereby help you to avoid mistakes.

C - Enforcing string parameter to be in read-only memory

I'm optimizing some code, and I have a function like this:
const char * gStrPtr = NULL;
void foo (const char *str) {
gStrPtr = strdup(str);
}
As of now, foo() is only called with constant strings. eg:
const char fooStr[]="Some really long string...";
foo(fooStr);
Notice that because it's always called with a constant, I should be able to just do:
void foo (const char *str) {
gStrPtr=str;
}
But, it opens up a sharp stick: if someone in the future breaks the convention, and tries calling foo() with a dynamic copy of a string that is later freed, it could cause undefined behavior.
I'm wondering if it's possible to create a compile-time or even a run-time check that checks if str is in read-only memory to avoid expensive bug-chases down the road.
Note: if I assume str is a string literal, then I can do it with a macro as so:
#define foo(str) foo_func("" str)
which will cause compile errors on non string-literals. But it also does not accept pointers to const chars.
EDIT
I thought I would post this after the discussion below. #CraigEtsy pointed out the use of __builtin_constant_p, which is a best-effort approach at this problem (but will likely suffice for my needs). I did the following tests with this, and got these results:
void foo(const char *str) {
if (__builtin_constant_p(*str))
printf("%s is constant\n", str);
else
printf("%s is not constant\n", str);
}
const char globalArray[] = "globalArray";
const char *globalPtr = "globalPtr";
int main()
{
const char localArray[]="localArray";
const char *localPtr="localPtr";
char localNonConst[]="localNonConst";
foo("literal"); // constant
foo(localArray); // not constant
foo(localPtr); // constant
foo(globalArray); // constant
foo(globalPtr); // not constant
foo(localNonConst); // not constant
}
And when compiled with -O3, it gave results:
literal is constant
localArray is not constant
localPtr is constant
globalArray is constant
globalPtr is not constant
localNonConst is not constant
So, for my particular case, I can just switch the const char arr[]="str"'s to const char * arr="str", and then, in my foo(), I can check if the value is constant, and allocate memory and raise a runtime warning if not (and mark a flag so I know whether to free the pointer later on...).
I don't think there's any reasonable way to enforce this at runtime, at least not without machinery that would be many orders of magnitude more expensive than just calling strdup.
If the function is only supposed to take immutable strings as arguments (that's the word you're looking for -- immutable, in the sense that its lifetime will be the remainder of the process's lifetime and its contents will not change for the remainder of its lifetime), this needs to be a documented part of its interface contract.

Differences between moments of char pointer assignment

Today I was told, that this code:
int main(){
char *a;
a = "foobar";
/* a used later to strcpy */
return 0;
}
Is bad, and may lead to problems and errors.
However, my code worked without any problems, and I don't understand, what is the difference between this, and
int main(){
char *a = "foobar";
/* a used later to strcpy */
return 0;
}
Which was described to me as the "correct" way.
Could someone describe, why these two codes are different?
And, if the first one may be problematic, show an example of this?
Functionally, they are the same.
In the former snippet, a is assigned to a string literal; in the latter, a is initialized with a string literal.
In both cases, a points to string literal (which can't be modified).
There's no reason to consider one as more correct than the other. I'd prefer the latter - but that's just my personal preference.
Both snippets are equally bad, because both end in a non const pointer pointing to const data. If the non const pointer is used to (try to) change the data, you will get Undefined Behaviour: everything can happen from it works to program crashes including modifying instruction is ignored.
The correct way is to either use a const pointer or to initialize a non const array.
const char *a = "foobar";
or
char a[] = "foobar";
But beware, in latter case you have a true array not a pointer, so you could also do if you really need pointer semantics:
char _a[] = "foobar";
char *a = _a;
There are some places that have coding standards, for instance to help with static code analysis by tools like Coverity.
A coding practice rule that I have seen several places is that variables should always be declared initialized to simplify things to make analysis easier.
You second snippet hews more closely to that rule than the first, as its impossible to insert new code where a could be used uninitialized.
That's a positive benefit when it comes to code maintenance.

Pointer to a character constant

Sorry for being such a dumb here. Can't sort this out for myself.
In a header file there is a Macro like this.
#define kOID "1.3.6.1.4.1.1.1.2.4.0"
How to declare and initialize a char pointer to this data without creating a copy of this string?
Preprocessor macros are nothing but a textual substitution. Thus if you write
const char *pointer = kOID;
the preprocessor will substitute the text with
const char *pointer = "1.3.6.1.4.1.1.1.2.4.0";
One thing to bear in mind is that the const specifier is necessary since once the textual substitution is made, the memory will be allocated on read-only segments.
Also be careful to have the macro visible at the point where you'd like to declare that pointer.
Assuming that you're not planning to change the contents of this string, you can simply use:
char* p = kOID;
The string will reside in a read-only section of the program, so any attempt to change its contents will result with a memory access violation during runtime. So for your own safety, you should generally use:
const char* p = kOID;
Thus, any attempt to change the contents of the string pointed by p will lead to a compile-time error instead of a runtime error. The former is typically much easier to track-down and fix than the latter.
To summarize the const issue, here are the options that you can use:
char* p = kOID;
char* const p = kOID; // compilation error if you change the pointer
const char* p = kOID; // compilation error if you change the pointed data
const char* const p = kOID; // compilation error if you change either one of them
UPDATE - Memory Usage Considerations:
Please note that every such declaration may result with an additional memory usage, adding up to the length of the string plus one character, plus 4 or 8 bytes for the pointer (depending on your system). Now, the pointer is perhaps less of an issue, but the string itself might yield an extensive memory usage if you instantiate it in several places in the code. So if you're planning to use the string in various places within your program, then you should probably declare it globally in one place.
In addition, please note that the string may reside either in the code-section of the program or in the data-section of the program. Depending on your memory partitions, you may prefer having it in one place over the other.
include the header file first.
#include <header.h>
Add the defined constant
char * s = kOID;
This will compile the program fine. However as kOID is a string literal it'll be saved on read only memory of your program. So if you modify the s it'll cause Segmentation fault. The get around is to make s constant.
const char * s = kOID;
Now if you compile the program compiler will check any assignment on s and notice accordingly.
a.c: In function ‘main’:
a.c:10:5: error: assignment of read-only location ‘*s’
So you'll be safe.
To add to what has been said by others, also you can initialize your array this way:
const char some_string[] = kOID;
This is similar to const char *const some_string = kOID;. Possibly, it may lead to additional memory allocation but this depends on compiler.

Are static const variables thread-safe?

I'm wondering whether static constant variables are thread-safe or not?
Example code snippet:
void foo(int n)
{
static const char *a[] = {"foo","bar","egg","spam"};
if( ... ) {
...
}
}
Any variable that is never modified, whether or not it's explicitly declared as const, is inherently thread-safe.
const is not a guarantee from the compiler that a variable is immutable. const is a promise that you make to the compiler that a variable will never be modified. If you go back on that promise, the compiler will generate an error pointing that out to you, but you can always silence the compiler by casting away constness.
To be really safe you should do
static char const*const a[]
this inhibits modification of the data and all the pointers in the table to be modified.
BTW, I prefer to write the const after the typename such that it is clear at a first glance to where the const applies, namely to the left of it.
In your example the pointer itself can be considered as thread safe. It will be initialized once and won't be modified later.
However, the content of the memory pointed won't be thread-safe at all.
In this example, a is not const. It's an array of pointers to const strings. If you want to make a itself const, you need:
static const char *const a[] = {"foo","bar","egg","spam"};
Regardless of whether it's const or not, it's always safe to read data from multiple threads if you do not write to it from any of them.
As a side note, it's usually a bad idea to declare arrays of pointers to constant strings, especially in code that might be used in shared libraries, because it results in lots of relocations and the data cannot be located in actual constant sections. A much better technique is:
static const char a[][5] = {"foo","bar","egg","spam"};
where 5 has been chosen such that all your strings fit. If the strings are variable in length and you don't need to access them quickly (for example if they're error messages for a function like strerror to return) then storing them like this is the most efficient:
static const char a[] = "foo\0bar\0egg\0spam\0";
and you can access the nth string with:
const char *s;
for (i=0, s=a; i<n && *s; s+=strlen(s)+1);
return s;
Note that the final \0 is important. It causes the string to have two 0 bytes at the end, thus stopping the loop if n is out of bounds. Alternatively you could bounds-check n ahead of time.
static const char *a[] = {"foo","bar","egg","spam"};
In C that would be always thread safe: the sructures would be already created at compile time, thus no extra action is taken at run time, thus no race condition is possible.
Beware the C++ compatibility though. Static const object would be initialized on the first entry into the function, but the initialization is not guaranteed to be thread-safe by the language. IOW this is open to a race condition when two different threads come into the function simultaneously and try to initialize the object in parallel.
But even in C++, POD (plain old data: structures not using C++ features, like in your example) would behave in the C compatible way.

Resources