I'm wondering whether static constant variables are thread-safe or not?
Example code snippet:
void foo(int n)
{
static const char *a[] = {"foo","bar","egg","spam"};
if( ... ) {
...
}
}
Any variable that is never modified, whether or not it's explicitly declared as const, is inherently thread-safe.
const is not a guarantee from the compiler that a variable is immutable. const is a promise that you make to the compiler that a variable will never be modified. If you go back on that promise, the compiler will generate an error pointing that out to you, but you can always silence the compiler by casting away constness.
To be really safe you should do
static char const*const a[]
this inhibits modification of the data and all the pointers in the table to be modified.
BTW, I prefer to write the const after the typename such that it is clear at a first glance to where the const applies, namely to the left of it.
In your example the pointer itself can be considered as thread safe. It will be initialized once and won't be modified later.
However, the content of the memory pointed won't be thread-safe at all.
In this example, a is not const. It's an array of pointers to const strings. If you want to make a itself const, you need:
static const char *const a[] = {"foo","bar","egg","spam"};
Regardless of whether it's const or not, it's always safe to read data from multiple threads if you do not write to it from any of them.
As a side note, it's usually a bad idea to declare arrays of pointers to constant strings, especially in code that might be used in shared libraries, because it results in lots of relocations and the data cannot be located in actual constant sections. A much better technique is:
static const char a[][5] = {"foo","bar","egg","spam"};
where 5 has been chosen such that all your strings fit. If the strings are variable in length and you don't need to access them quickly (for example if they're error messages for a function like strerror to return) then storing them like this is the most efficient:
static const char a[] = "foo\0bar\0egg\0spam\0";
and you can access the nth string with:
const char *s;
for (i=0, s=a; i<n && *s; s+=strlen(s)+1);
return s;
Note that the final \0 is important. It causes the string to have two 0 bytes at the end, thus stopping the loop if n is out of bounds. Alternatively you could bounds-check n ahead of time.
static const char *a[] = {"foo","bar","egg","spam"};
In C that would be always thread safe: the sructures would be already created at compile time, thus no extra action is taken at run time, thus no race condition is possible.
Beware the C++ compatibility though. Static const object would be initialized on the first entry into the function, but the initialization is not guaranteed to be thread-safe by the language. IOW this is open to a race condition when two different threads come into the function simultaneously and try to initialize the object in parallel.
But even in C++, POD (plain old data: structures not using C++ features, like in your example) would behave in the C compatible way.
Related
Let's take the following two function:
#include<stdio.h>
void my_print1(char* str) {
// str = "OK!";
printf("%s\n", str);
}
void my_print2(char* const str) {
// str = "OK!";
printf("%s\n", str);
}
They both produce the same assembly:
How then is the const-ness enforced here? For example, if I un-comment str = "OK!; it will of course work in the first function call but not the second (error: assignment of read-only parameter ‘str’).
Is the const-ness of a local variable just a compiler construct, and it is responsible for checking that, or how does it work if the assembly for the two functions is the same? Note: this is C only, not C++ (as I think they treat const different).
Correct, on most implementations it's just a compiler construct.
On a typical mainstream OS implementation, there is a way to place const objects having static storage duration in memory that is actually write-protected by the CPU's memory-management unit (MMU), e.g. a .text or .rodata section. Then attempts to write it, if not prevented at compile time, will cause a trap at runtime. But hardware write protection applies to large blocks of memory (e.g. whole pages). There is no good way to do this with auto objects, such as local variables or function parameters, which live in stack memory or in registers. On the stack, since they are mixed in with non-const variables, hardware write protection is not fine-grained enough to apply to them, and in any case would be very expensive to be continually changing (it needs a call to the operating system). And registers on most machines cannot be made read-only at all.
Since there is no good way to protect them at runtime, compilers often generate the exact same code for const auto objects as for non-const.
You might see differences in some cases, since const informs the compiler that the object's value is not supposed to change, and therefore the compiler can assume that it does not. For instance, if you pass a pointer to a non-const object to another function, the compiler has to assume that the value of the object may have been changed, and will reload it from memory after each function call. But a const object may have its value cached in a register across the functionn call, or just optimized into an immediate constant if possible.
The const qualifier on function parameters and other local variables normally has no effect on the generated code. It just tells the compiler to prevent assigning to the variable.
Theoretically, it could generate code that prevents modifying the variable through other means. E.g. if you had
void my_print2(char* const str) {
*(char *)&str = "OK!";
printf("%s\n", str);
}
The assignment causes undefined behavior, but won't cause an error (although the compiler might warn about casting away constness). But the compiler could theoretically store str in memory that's marked read-only; in that case, the assignment would cause a segmentation fault. This would not normally be done for function parameters because it's difficult to reconcile that with using the stack for automatic data. (Nate Eldredge's answer explains this better.)
The compiler enforces const by refusing to generate non-compliant code. If the my_print2 source had attempted to modify str, the compiler would have issued an error message.
With respect to your code however:
void my_print2(char* const str)...
It's kind of pointless as it can only limit what the function can do with the value of the pointer itself, not what it can do to memory it points to, so:
void my_print2(char* const str)
{
str++; // Not allowed, compile time error.
*str = 'A'; // Okay.
}
Functions that do not need to modify the content of what is pointed to, should be declared as const <type> * to give callers confidence that your function, print in this case, won't change their data:
void my_print3(const char* str)
{
str++; // Okay
*str = 'A'; // Not allowed, compile time error.
}
void my_print4(char const* str)
{
str++; // Okay
*str = 'A'; // Not allowed, compile time error.
}
I've always used string constants in C as one of the following
char *filename = "foo.txt";
const char *s = "bar"; /* preferably this or the next one */
const char * const s3 = "baz":
But, after reading this, now I'm wondering, should I be declaring my string constants as
const char s4[] = "bux";
?
Please note that linked question suggested as a duplicate is different because this one is specifically asking about constant strings. I know how the types are different and how they are stored. The array version in that question is not const-qualified. This was a simple question as to whether I should use constant array for constant strings vs. the pointer version I had been using. The answers here have answered my question, when two days of searching on SO and Google did not yield an exact answer. Thanks to these answers, I've learned that the compiler can do special things when the array is marked const, and there are indeed (at least one) case where I will now be using the array version.
Pointer and arrays are different. Defining string constants as pointers or arrays fits different purposes.
When you define a global string constant that is not subject to change, I would recommend you make it a const array:
const char product_name[] = "The program version 3";
Defining it as const char *product_name = "The program version 3"; actually defines 2 objects: the string constant itself, which will reside in a constant segment, and the pointer which can be changed to point to another string or set to NULL.
Conversely, defining a string constant as a local variable would be better done as a local pointer variable of type const char *, initialized with the address of a string constant:
int main() {
const char *s1 = "world";
printf("Hello %s\n", s1);
return 0;
}
If you define this one as an array, depending on the compiler and usage inside the function, the code will make space for the array on the stack and initialize it by copying the string constant into it, a more costly operation for long strings.
Note also that const char const *s3 = "baz"; is a redundant form of const char *s3 = "baz";. It is different from const char * const s3 = "baz"; which defines a constant pointer to a constant array of characters.
Finally, string constants are immutable and as such should have type const char []. The C Standard purposely allows programmers to store their addresses into non const pointers as in char *s2 = "hello"; to avoid producing warnings for legacy code. In new code, it is highly advisable to always use const char * pointers to manipulate string constants. This may force you to declare function arguments as const char * when the function does not change the string contents. This process is known as constification and avoid subtile bugs.
Note that some functions violate this const propagation: strchr() does not modify the string received, declared as const char *, but returns a char *. It is therefore possible to store a pointer to a string constant into a plain char * pointer this way:
char *p = strchr("Hello World\n", 'H');
This problem is solved in C++ via overloading. C programmers must deal with this as a shortcoming. An even more annoying situation is that of strtol() where the address of a char * is passed and a cast is required to preserve proper constness.
The linked article explores a small artificial situation, and the difference demonstrated vanishes if you insert const after * in const char *ptr = "Lorum ipsum"; (tested in Apple LLVM 10.0.0 with clang-1000.11.45.5).
The fact the compiler had to load ptr arose entirely from the fact it could be changed in some other module not visible to the compiler. Making the pointer const eliminates that, and the compiler can prepare the address of the string directly, without loading the pointer.
If you are going to declare a pointer to a string and never change the pointer, then declare it as static const char * const ptr = "string";, and the compiler can happily provide the address of the string whenever the value of ptr is used. It does not need to actually load the contents of ptr from memory, since it can never change and will be known to point to wherever the compiler chooses to store the string. This is then the same as static const char array[] = "string";—whenever the address of the array is needed, the compiler can provide it from its knowledge of where it chose to store the array.
Furthermore, with the static specifier, ptr cannot be known outside the translation unit (the file being compiled), so the compiler can remove it during optimization (as long as you have not taken its address, perhaps when passing it to another routine outside the translation unit). The result should be no differences between the pointer method and the array method.
Rule of thumb: Tell the compiler as much as you know about stuff: If it will never change, mark it const. If it is local to the current module, mark it static. The more information the compiler has, the more it can optimize.
From the performance perspective, this is a fairly small optimization which makes sense for low-level code that needs to run with the lowest possible latency.
However, I would argue that const char s3[] = "bux"; is better from the semantic perspective, because the type of the right hand side is closer to type of the left hand side. For that reason, I think it makes sense to declare string constants with the array syntax.
Today I was told, that this code:
int main(){
char *a;
a = "foobar";
/* a used later to strcpy */
return 0;
}
Is bad, and may lead to problems and errors.
However, my code worked without any problems, and I don't understand, what is the difference between this, and
int main(){
char *a = "foobar";
/* a used later to strcpy */
return 0;
}
Which was described to me as the "correct" way.
Could someone describe, why these two codes are different?
And, if the first one may be problematic, show an example of this?
Functionally, they are the same.
In the former snippet, a is assigned to a string literal; in the latter, a is initialized with a string literal.
In both cases, a points to string literal (which can't be modified).
There's no reason to consider one as more correct than the other. I'd prefer the latter - but that's just my personal preference.
Both snippets are equally bad, because both end in a non const pointer pointing to const data. If the non const pointer is used to (try to) change the data, you will get Undefined Behaviour: everything can happen from it works to program crashes including modifying instruction is ignored.
The correct way is to either use a const pointer or to initialize a non const array.
const char *a = "foobar";
or
char a[] = "foobar";
But beware, in latter case you have a true array not a pointer, so you could also do if you really need pointer semantics:
char _a[] = "foobar";
char *a = _a;
There are some places that have coding standards, for instance to help with static code analysis by tools like Coverity.
A coding practice rule that I have seen several places is that variables should always be declared initialized to simplify things to make analysis easier.
You second snippet hews more closely to that rule than the first, as its impossible to insert new code where a could be used uninitialized.
That's a positive benefit when it comes to code maintenance.
Consider the following example:
typedef struct Collection
{
...
} Collection;
typedef struct Iterator
{
Collection* collection;
} Iterator;
Iterator is to offer collection-modifying functions, hence it holds address of a non-const Collection. However, it is also to offer non-modifying functions, which are perfectly legal to use with a const Collection.
void InitIterator(Iterator* iter, Collection* coll)
{
iter->collection = coll;
}
void SomeFunction(const Collection* coll)
{
// we want to use just non-modifying functions here, as the Collection is const
Iterator iter;
InitIterator(&iter, coll); // warning, call removes const qualifier
}
I'm looking for a solution in C.
I do see some options:
1. At Init, cast Collection to non-const.
It's probably not undefined behaviour because the object ultimately should not get modified. But it is hair-raising, as having a const object, doing this is asking for trouble. The Iterator is to become a widely used, generic mechanism for working with collections. Having no compiler warnings when one is about to modify a const collection is really bad.
2. Two iterator types, one being a read-only version with a const Collection* member.
This complicates usage, potentially requires duplication of some functions, possibly reduced efficiency due to translation step. I really do not want to complicate API and have two different structs along with two sets of functions.
3. Iterator having both pointers, and Init taking both Collection pointers.
typedef struct Iterator
{
const Collection* collectionReadable; // use this when reading
Collection* collectionWritable; // use this when writing
} Iterator;
When having a const Collection, the non-const argument has to become NULL and ultimately trying to modify the collection would crash (trying to dereference NULL pointer), which is good (safe). We have extra storage cost. It is awkward, taking two pointers of the same collection.
I'm also a bit worried about compiler seeing both pointers in same context, meaning that if the Collection gets modified through the writable pointer, the compiler has to realize that the read-only pointer may be pointing to the same object and it needs to be reloaded despite having the const qualifier. Is this safe?
Which one would you pick and why?
Do you know any better approach to this problem?
Yes, it is safe to have two pointers to the same object, one const and the other non-const. The const just means the object cannot be modified via this particular pointer, but it can be modified via other pointers if there are such pointers. However, although it is safe, this does not definitely mean that you should choose the solution of having two pointers.
For an example of code that works perfectly correctly, see:
int x = 5;
const int *p = &x;
printf("%d\n", *p);
x = 7;
printf("%d\n", *p);
Now *p changes even though p is a const pointer.
Your goal here is to avoid code duplication, so what you want is simply to cast the const pointer to a non-const pointer. It's safe as long as you don't modify the object. There are plenty of similar examples in the C standard library. For example, strchr() takes a const char pointer into the string and then returns a char pointer just in case the string wasn't const and you want to modify it via the return value. I would choose the solution adopted in the C standard library, i.e. typecasting.
You have essentially noticed that the const qualifier in C isn't perfect. There are plenty of use cases when you run into problems like this, and the easiest way forward is typically to accept that it isn't perfect and do typecasting.
I have a char pointer which would be used to store a string. It is used later in the program.
I have declared and initialized like this:
char * p = NULL;
I am just wondering if this is good practice. I'm using gcc 4.3.3.
Yes, it's good idea.
Google Code Style recommends:
To initialize all your variables even if you don't need them right now.
Initialize pointers by NULL, int's by 0 and float's by 0.0 -- just for better readability.
int i = 0;
double x = 0.0;
char* c = NULL;
You cannot store a string in a pointer.
Your definition of mgt_dev_name is good, but you need to point it somewhere with space for your string. Either malloc() that space or use a previously defined array of characters.
char *mgt_dev_name = NULL;
char data[4200];
/* ... */
mgt_dev_name = data; /* use array */
/* ... */
mgt_dev_name = malloc(4200);
if (mgt_dev_name != NULL) {
/* use malloc'd space */
free(mgt_dev_name);
} else {
/* error: not enough memory */
}
It is good practice to initialize all variables.
If you're asking whether it's necessary, or whether it's a good idea to initialize the variable to NULL before you set it to something else later on: It's not necessary to initialize it to NULL, it won't make any difference for the functionality of your program.
Note that in programming, it's important to understand every line of code - why it's there and what exactly it's doing. Don't do things without knowing what they mean or without understanding why you're doing them.
Another option is to not define the variable until the place in your code where you have access to it's initial value. So rather then doing:
char *name = NULL;
...
name = initial_value;
I would change that to:
...
char *name = initial_value;
The compiler will then prevent you from referencing the variable in the part of the code where it has no value. Depending on the specifics of your code this may not always be possible (for example, the initial value is set in an inner scope but the variable has a different lifetime), moving the definition as late as possible in the code prevents errors.
That said, this is only allowed starting with the c99 standard (it's also valid C++). To enable c99 features in gcc, you'll need to either do:
gcc -std=gnu99
or if you don't want gcc extensions to the standard:
gcc -std=c99
No, it is not a good practice, if I understood your context correctly.
If your code actually depends on the mgt_dev_name having the initial value of a null-pointer, then, of course, including the initializer into the declaration is a very good idea. I.e. if you'd have to do this anyway
char *mgt_dev_name;
/* ... and soon after */
mgt_dev_name = NULL;
then it is always a better idea to use initialization instead of assignment
char *mgt_dev_name = NULL;
However, initialization is only good when you can initialize your object with a meaningful useful value. A value that you will actually need. In general case, this is only possible in languages that allow declarations at any point in the code, C99 and C++ being good examples of such languages. By the time you need your object, you usually already know the appropriate initializer for that object, and so can easily come up with an elegant declaration with a good initializer.
In C89/90 on the other hand, declarations can only be placed at the beginning of the block. At that point, in general case, you won't have meaningful initializers for all of your objects. Should you just initialize them with something, anything (like 0 or NULL) just to have them initialized? No!!! Never do meaningless things in your code. It will not improve anything, regardless of what various "style guides" might tell you. In reality, meaningless initialization might actually cover bugs in your code, making it the harder to discover and fix them.
Note, that even in C89/90 it is always beneficial to strive for better locality of declarations. I.e. a well-known good practice guideline states: always make your variables as local as they can be. Don't pile up all your local object declarations at the very beginning of the function, but rather move them to the beginning of the smallest block that envelopes the entire lifetime of the object as tightly as possible. Sometimes it might even be a good idea to introduce a fictive, otherwise unnecessary block just to improve the locality of declarations. Following this practice will help you to provide good useful initializers to your objects in many (if not most) cases. But some objects will remain uninitialized in C89/90 just because you won't have a good initializer for them at the point of declaration. Don't try to initialize them with "something" just for the sake of having them initialized. This will achieve absolutely nothing good, and might actually have negative consequences.
Note that some modern development tools (like MS Visual Studio 2005, for example) will catch run-time access to uninitialized variables in debug version of the code. I.e these tools can help you to detect situations when you access a variable before it had a chance to acquire a meaningful value, indicating a bug in the code. But performing unconditional premature initialization of your variables you essentially kill that capability of the tool and sweep these bugs under the carpet.
This topic has already been discussed here:
http://www.velocityreviews.com/forums/t282290-how-to-initialize-a-char.html
It refers to C++, but it might be useful for you, too.
There are several good answers to this question, one of them has been accepted. I'm going to answer anyway in order to expand on practicalities.
Yes, it is good practice to initialize pointers to NULL, as well as set pointers to NULL after they are no longer needed (i.e. freed).
In either case, its very practical to be able to test a pointer prior to dereferencing it. Lets say you have a structure that looks like this:
struct foo {
int counter;
unsigned char ch;
char *context;
};
You then write an application that spawns several threads, all of which operate on a single allocated foo structure (safely) through the use of mutual exclusion.
Thread A gets a lock on foo, increments counter and checks for a value in ch. It does not find one, so it does not allocate (or modify) context. Instead, it stores a value in ch so that thread B can do this work instead.
Thread B Sees that counter has been incremented, notes a value in ch but isn't sure if thread A has done anything with context. If context was initialized as NULL, thread B no longer has to care what thread A did, it knows context is safe to dereference (if not NULL) or allocate (if NULL) without leaking.
Thread B does its business, thread A reads its context, frees it, then re-initializes it to NULL.
The same reasoning applies to global variables, without the use of threads. Its good to be able to test them in various functions prior to dereferencing them (or attempting to allocate them thus causing a leak and undefined behavior in your program).
When it gets silly is when the scope of the pointer does not go beyond a single function. If you have a single function and can't keep track of the pointers within it, usually this means the function should be re-factored. However, there is nothing wrong with initializing a pointer in a single function, if only to keep uniform habits.
The only time I've ever seen an 'ugly' case of relying on an initialized pointer (before and after use) is in something like this:
void my_free(void **p)
{
if (*p != NULL) {
free(*p);
*p = NULL;
}
}
Not only is dereferencing a type punned pointer frowned upon on strict platforms, the above code makes free() even more dangerous, because callers will have some delusion of safety. You can't rely on a practice 'wholesale' unless you are sure every operation is in agreement.
Probably a lot more information than you actually wanted.
Preferred styles:
in C: char * c = NULL;
in C++: char * c = 0;
My rationale is that if you don't initialize with NULL, and then forget to initialize altogether, the kinds of bugs you will get in your code when dereferencing are much more difficult to trace due to the potential garbage held in memory at that point. On the other hand, if you do initialize to NULL, most of the time you will only get a segmentation fault, which is better, considering the alternative.
Initializing variables even when you don't need them initialized right away is a good practice. Usually, we initialize pointers to NULL, int to 0 and floats to 0.0 as a convention.
int* ptr = NULL;
int i = 0;
float r = 0.0;
It is always good to initialize pointer variables in C++ as shown below:
int *iPtr = nullptr;
char *cPtr = nullptr;
Because initializing as above will help in condition like below since nullptr is convertible to bool, else your code will end up throwing some compilation warnings or undefined behaviour:
if(iPtr){
//then do something.
}
if(cPtr){
//then do something.
}