C - Enforcing string parameter to be in read-only memory - c

I'm optimizing some code, and I have a function like this:
const char * gStrPtr = NULL;
void foo (const char *str) {
gStrPtr = strdup(str);
}
As of now, foo() is only called with constant strings. eg:
const char fooStr[]="Some really long string...";
foo(fooStr);
Notice that because it's always called with a constant, I should be able to just do:
void foo (const char *str) {
gStrPtr=str;
}
But, it opens up a sharp stick: if someone in the future breaks the convention, and tries calling foo() with a dynamic copy of a string that is later freed, it could cause undefined behavior.
I'm wondering if it's possible to create a compile-time or even a run-time check that checks if str is in read-only memory to avoid expensive bug-chases down the road.
Note: if I assume str is a string literal, then I can do it with a macro as so:
#define foo(str) foo_func("" str)
which will cause compile errors on non string-literals. But it also does not accept pointers to const chars.
EDIT
I thought I would post this after the discussion below. #CraigEtsy pointed out the use of __builtin_constant_p, which is a best-effort approach at this problem (but will likely suffice for my needs). I did the following tests with this, and got these results:
void foo(const char *str) {
if (__builtin_constant_p(*str))
printf("%s is constant\n", str);
else
printf("%s is not constant\n", str);
}
const char globalArray[] = "globalArray";
const char *globalPtr = "globalPtr";
int main()
{
const char localArray[]="localArray";
const char *localPtr="localPtr";
char localNonConst[]="localNonConst";
foo("literal"); // constant
foo(localArray); // not constant
foo(localPtr); // constant
foo(globalArray); // constant
foo(globalPtr); // not constant
foo(localNonConst); // not constant
}
And when compiled with -O3, it gave results:
literal is constant
localArray is not constant
localPtr is constant
globalArray is constant
globalPtr is not constant
localNonConst is not constant
So, for my particular case, I can just switch the const char arr[]="str"'s to const char * arr="str", and then, in my foo(), I can check if the value is constant, and allocate memory and raise a runtime warning if not (and mark a flag so I know whether to free the pointer later on...).

I don't think there's any reasonable way to enforce this at runtime, at least not without machinery that would be many orders of magnitude more expensive than just calling strdup.
If the function is only supposed to take immutable strings as arguments (that's the word you're looking for -- immutable, in the sense that its lifetime will be the remainder of the process's lifetime and its contents will not change for the remainder of its lifetime), this needs to be a documented part of its interface contract.

Related

Is it valid to use "restrict" when there is the potential for reallocating memory (changing the pointer)?

I am attempting some optimization of code, but it is hard to wrap my head around whether "restrict" is useful in this situation or if it will cause problems.
I have a function that is passed two strings (char*) as well as an int (int*).
The second string is copied into memory following the first string, at the position indicated by the int. If this would overrun the allocation of memory for the first string, it must reallocate memory for the first string before doing so. A new pointer is created with the new allocation, and then the original first string pointer is set equal to it.
char* concatFunc (char* restrict first_string, char* const restrict second_string, int* const restrict offset) {
size_t block = 200000;
size_t len = strlen(second_string);
char* result = first_string;
if(*offset+len+1>block){
result = realloc(result,2*block);
}
memcpy(result+*offset,second_string,len+1);
*offset+=len;
return result;
}
The above function is repeatedly called by other functions that are also using the restrict keyword.
char* addStatement(char* restrict string_being_built, ..., int* const restrict offset){
char new_statement[30] = "example additional text";
string_being_built = concatFunc(string_being_built,&new_statement,offset);
}
So in the concatFunc the first_string is restricted (meaning memory pointed to will not be changed from anywhere else). But then if I am reallocating a pointer that is a copy of that, is that going to cause undefined behavior or is the compiler smart enough to accommodate that?
Basically: What happens when you restrict a pointer parameter, but then change the pointer.
What happens when you restrict a pointer parameter, but then change the pointer.
It depends on how the pointer was changed - and in this case, memcpy() risks UB.
With char* result = first_string;, inherits the restrict of char* restrict first_string.
After result = realloc(result,2*block);, result is as before and accessing via result does not collide with accessing through second_string or offset or result is new memory and accessing via result does not collide with accessing through second_string or offset.
Yet can the compiler know the newly assigned result has those one of two above properties of realloc()? After all, realloc() might be a user defined function and compiler should not assume result now has the restrict property anymore.
Thus memcpy() is in peril.
is the compiler smart enough to accommodate that?
I do not see it can, other than warn about memcpy() usage.
Of course OP can use memmove() instead of memcpy() to avoid the concern.
As I see it, a simplified example would be:
char* concatFunc (char* restrict first_string, char* restrict second_string) {
int block = rand();
first_string = foo(first_string, block);
// first_string at this point may equal second_string,
// breaking the memcpy() contract
memcpy(first_string, second_string, block);
return first_string;
}
Or even simpler
char* concatFunc (char* /* no restrict */ first_string, char* restrict second_string) {
return memcpy(first_string, second_string, 2);
}

where is the practical use for const char*,volatile const char*,char*const. example of applications will help understand better

const char*
volatile const char*
char *const
In c under what circumstance is it used? an example will help understand better.
const char * (and char const *, both are equivalent) are used to point to characters that are constant, i.e. the string can't be modified.
Example:
const char string1[] = "foo"; // Create a constant, unmodifiable string
const char string2[] = "bar";
const char *s = string1;
// s[0] = 'o'; ERROR: Attempting to modify the constant data
s = string2; // Okay, makes s point to another constant string
char * const make the pointer variable itself constant, the pointer can't be modified to point somewhere else. The string contents can be modified though.
Example:
char string1[] = "foo";
char string2[] = "bar";
char * const s = string1; // Initialize to point to string in string1
s[0] = 'o'; // Okay, string1 is now "ooo"
// s = string2; ERROR: Attempting to modify a constant variable
These const qualifiers can be combined:
const char string1[] = "foo";
const char string2[] = "bar";
const char * const s = string1; // Initialize to point to string in string1
// s[0] = 'o'; ERROR: Attempting to modify the constant data
// s = string2; ERROR: Attempting to modify a constant variable
As #Someprogrammerdude has covered the language implications of const qualifier I will focus on some implementation aspects
Embedded (especially microcontroller) programmers need to control where the data is placed. If you have a lookup table or data which does not change you want to put it into the FLASH memory and not waste the precious RAM.
Most (known to me) embedded implementations will place const data in the .rodata segment which usually is physically located in the non-volatile memory (mainly FLASH).
after edit:
const char* - pointer is not const, data referenced by the pointer is const
const char const * - syntax error - means nothing
const char * const - pointer is const, data referenced by the pointer is const
volatile const char* - pointer is not const, data referenced by the pointer is const but is side effect prone. It means that you cant change that data, but something can. As an example - the read-only hardware register mapped into the address space.
const *char - it is wrong syntax and it means nothing
there is no constant in C so I assume you mean const.
const char const * is the same as constant char * so it makes no sense to repeat the const
const *char is not legal C syntax
Anyway - your main question seeems to be
Why and when should I use const in my code
There are several reasons. Using const gives your compiler some extra information about what you want to do. The compiler can make a number of decisions based on that information like doing some optimization, deciding where in memory to place your object and even catching programming bugs.
Another benefit is that you can provide the same information to users of your code. For instance if you write a function that I'm going to use and do it like:
void foo(char * string);
I can't know whether foo will change the contents of string. I'll have to read some additional documentation to find out.
But if you do
void foo(const char * string);
I know that your function foo will not change the string that I pass to it. Further, I would know that it's safe to pass a string literal to that function. In other words - by using const you have given me some extra information in the function proto type and thereby made it easier to use your code.
Another benefit is that it can help you catch mistakes. Let's say you design the function foo so that it shouldn't modify string. One year later you (or a co-worker) need to make some updates to the function. Meanwhile you have forgotten (or your co-worker didn't realize) that foo shouldn't change string and therefore do stuff like string[8] = 'A'. If you originally declared foo using const, the compiler will issue an error for that code and thereby help you to avoid mistakes.

How Should I Define/Declare String Constants

I've always used string constants in C as one of the following
char *filename = "foo.txt";
const char *s = "bar"; /* preferably this or the next one */
const char * const s3 = "baz":
But, after reading this, now I'm wondering, should I be declaring my string constants as
const char s4[] = "bux";
?
Please note that linked question suggested as a duplicate is different because this one is specifically asking about constant strings. I know how the types are different and how they are stored. The array version in that question is not const-qualified. This was a simple question as to whether I should use constant array for constant strings vs. the pointer version I had been using. The answers here have answered my question, when two days of searching on SO and Google did not yield an exact answer. Thanks to these answers, I've learned that the compiler can do special things when the array is marked const, and there are indeed (at least one) case where I will now be using the array version.
Pointer and arrays are different. Defining string constants as pointers or arrays fits different purposes.
When you define a global string constant that is not subject to change, I would recommend you make it a const array:
const char product_name[] = "The program version 3";
Defining it as const char *product_name = "The program version 3"; actually defines 2 objects: the string constant itself, which will reside in a constant segment, and the pointer which can be changed to point to another string or set to NULL.
Conversely, defining a string constant as a local variable would be better done as a local pointer variable of type const char *, initialized with the address of a string constant:
int main() {
const char *s1 = "world";
printf("Hello %s\n", s1);
return 0;
}
If you define this one as an array, depending on the compiler and usage inside the function, the code will make space for the array on the stack and initialize it by copying the string constant into it, a more costly operation for long strings.
Note also that const char const *s3 = "baz"; is a redundant form of const char *s3 = "baz";. It is different from const char * const s3 = "baz"; which defines a constant pointer to a constant array of characters.
Finally, string constants are immutable and as such should have type const char []. The C Standard purposely allows programmers to store their addresses into non const pointers as in char *s2 = "hello"; to avoid producing warnings for legacy code. In new code, it is highly advisable to always use const char * pointers to manipulate string constants. This may force you to declare function arguments as const char * when the function does not change the string contents. This process is known as constification and avoid subtile bugs.
Note that some functions violate this const propagation: strchr() does not modify the string received, declared as const char *, but returns a char *. It is therefore possible to store a pointer to a string constant into a plain char * pointer this way:
char *p = strchr("Hello World\n", 'H');
This problem is solved in C++ via overloading. C programmers must deal with this as a shortcoming. An even more annoying situation is that of strtol() where the address of a char * is passed and a cast is required to preserve proper constness.
The linked article explores a small artificial situation, and the difference demonstrated vanishes if you insert const after * in const char *ptr = "Lorum ipsum"; (tested in Apple LLVM 10.0.0 with clang-1000.11.45.5).
The fact the compiler had to load ptr arose entirely from the fact it could be changed in some other module not visible to the compiler. Making the pointer const eliminates that, and the compiler can prepare the address of the string directly, without loading the pointer.
If you are going to declare a pointer to a string and never change the pointer, then declare it as static const char * const ptr = "string";, and the compiler can happily provide the address of the string whenever the value of ptr is used. It does not need to actually load the contents of ptr from memory, since it can never change and will be known to point to wherever the compiler chooses to store the string. This is then the same as static const char array[] = "string";—whenever the address of the array is needed, the compiler can provide it from its knowledge of where it chose to store the array.
Furthermore, with the static specifier, ptr cannot be known outside the translation unit (the file being compiled), so the compiler can remove it during optimization (as long as you have not taken its address, perhaps when passing it to another routine outside the translation unit). The result should be no differences between the pointer method and the array method.
Rule of thumb: Tell the compiler as much as you know about stuff: If it will never change, mark it const. If it is local to the current module, mark it static. The more information the compiler has, the more it can optimize.
From the performance perspective, this is a fairly small optimization which makes sense for low-level code that needs to run with the lowest possible latency.
However, I would argue that const char s3[] = "bux"; is better from the semantic perspective, because the type of the right hand side is closer to type of the left hand side. For that reason, I think it makes sense to declare string constants with the array syntax.

C dynamic array of strings

I'm trying to write a simple implementation of dynamic arrays of strings in C. Here's my code (minus the includes and main function etc ...):
typedef char* string;
typedef struct {
string* list;
size_t size;
size_t used;
} list;
void initList(list* l, size_t initSize) {
l->list = malloc(initSize * sizeof(string));
l->used = 0;
l->size = initSize;
}
void insertList(list* l, string elem) {
if (l->used == l->size) {
l->size *= 2;
l->list = realloc(l->list, l->size * sizeof(string));
}
l->list[l->used++] = elem;
}
My code seems to work as I expect, I'm asking my question because I read that you should use char[] instead of char*.
I read that using typedef char* string declares the string in read-only memory, so trying to modify it causes undefined behaviour.
If so, using the GCC C compiler I don't receive any errors or warnings and the code seems to work when compiled.
The functions for creating and growing the dynamic array where taken from another question on StackOverflow, the original question created a dynamic array of integers.
I'm just curious as to if my code is good/bad practice.
I read that using typedef char* string declares the string in read-only memory, so trying to modify it causes undefined behaviour.
Well, that's nonsense. You might confuse this with something like
char *foo = "foo";
In that case, although foo is a mutable pointer to a mutable set of characters, the characters it points to are NOT mutable. This is because a "string literal" in C always lives in non-mutable memory. It doesn't help that foo isn't const here. A good compiler will warn you, though -- you should only assign string literals to const char *.
If you want a mutable string, initialize a char[] from it, this way it's not a pointer and the compiler knows to place it in mutable memory. But this really only concerns literals.
So there's nothing wrong with using char * as your string type. In fact, that's what C does implicitly for string literals. I'd still have a little objection: Seasoned C-programmers will know about char * et al and they will expect that "plain" strings are char * (or const char *). If you name something string, it should somehow provide more than that. If it doesn't, just don't confuse people and go by char *.

In C, declare and initialize a variable in a small function or just return a response

Coming from a PHP background, I'm used to writing small functions that return a string (or the response from another function) like so:
function get_something(){
return "foo";
}
However, I'm new to C and am trying to figure how to do some really fundamental things like this.
Can people review the following similar functions and tell me how they differ and which one is the best/cleanest to use?
char *get_foo(){
char *bar;
bar = "bar";
return bar;
}
char *get_foo(){
char *bar = "bar";
return bar;
}
char *get_foo(){
char *bar = NULL;
bar = "bar";
return bar;
}
char *get_foo(){
return "bar";
}
Is there any difference between these functions or is this a style issue?
One other thing. If I have two functions and one calls the other, is this alright to do?
char *get_foo(){
return "bar";
}
char *get_taz(){
return get_foo();
}
UPDATE: How would these functions need to change if get_foo() did not return a const char*? What if get_foo() calls another function that has a char* of different lengths?
The four are equivalent, especially the first three ones - the compiler is likely to compile them to exactly the same code. So I'd go for the last one, for being smaller.
Having said that - you're returning a const char*, not a char*, so this particular code could break everything, depending on how you use it (if it compiles at all, which you can force anyway). The thing is, you're returing a pointer to a string that isn't dynamically allocated, but part of the executable image. So modifying it could be dangerous.
As a more general rule, never return a pointer to stuff allocated on the stack (ie not created using new or malloc) because as soon as the function ends, the scope of that variable also ends, gets destroyed, so you get a pointer to invalid (freed) memory.
Differences like this will usually be optimized out by the compiler anyway ... I would vote for :
char *get_foo(){
char *bar = "bar";
return bar;
}
or
const char *get_foo(){
return "bar";
}
or something along the lines of (but obviously more defensive, and on GNU system):
char *get_foo(){
return strdup("bar");
}
Depending on future use and expansion of the function. Really, due to optimizations, it is a readability issue, and how you want the string (mutable/not) for future use.
Because you are initializing the variable to a constant in the data of the program. I would do things differently if I were creating a string dynamically.
Like others already have stated, the compiler will produce likely the same code for the alternatives. But: are you forced to use C? Why not use C++ where you can use the std::string class. I haven't declared new char arrays for ages - too error-prone. You don't need to learn/master C before going to C++!
I'm always wary of return a pointer to a variable that exist on a lower scope level. When I first learned C some X-teen years ago, I can remember returning a pointer to a variable that was declared with local scope, before I called printf the debugger told me everything was normal but it never printed the right value. What was happening was: The variable was correct BEFORE the printf call, but when you call a function local variables get allocated on the stack, and deallocated upon return, so the variable that I had pointer to existed on the stack BEFORE calling printf and was the memory was reallocated to printf when the printf function was evoked thus overwritting the previous variables.
In your case the example you've given will assign a pointer to the constants table that is loaded as part of the executable and MIGHT be fine, depending on what else the actual program is doing, but I would recomend trying to keep the string at a higher level scope to prevent an easy bug from sneaking into your code as you tweek it. Based on the example you've given, you could probably have a string table allocated at the scope above this call, and just assign the variable instead of calling a function.
I.E.
#define FOO 0
#define BAR 1
#define FOOBAR 2
#define BARFOO 3
char *MyFooStrings[4] = {"Foo","Bar","FooBar","BarFoo"};
// Instead: myFoo = get_foo();
myFoo = MyFooStrings[FOO];
Pete
Is there any difference between these
functions or is this a style issue?
There is no difference in terms of output. As mentioned by others, the compiler will likely optimize the code anyway. My preference is to use:
char *get_foo(){
char *bar = "bar";
return bar;
}
If your return value gets to be more complex than a simple assignment, it helps to have the intermediate variable if you need to step through the code.
One other thing. If I have two
functions and one calls the other, is
this alright to do?
This is not a problem as long as you insure that the return types of the two functions are compatible.
UPDATE: How would these functions need
to change if get_foo() did not return
a const char*? What if get_foo() calls
another function that has a char* of
different lengths?
get_taz() just has to have a return type that is assignment-compatible with get_foo(). For example, if get_foo() returns an int, then get_taz() has to return something that you can assign an int to - like int, long int, or similar.
A "char* of different lengths" doesn't really mean anything, because a "char *" doesn't really mean "string" - it means "the location of some chars". Whether that location holds three chars or thirty, a "char *" is still a "char *", so this is perfectly OK:
const char *get_zero(void)
{
return "Zero";
}
const char *get_nonzero(void)
{
return "The number is non-zero";
}
const char *get_n(int n)
{
if (n == 0)
{
return get_zero();
}
else
{
return get_nonzero();
}
}
First off, some of those will cause your program to crash.
The function:
char *get_foo(){
char *bar;
bar = "bar";
return bar;
}
Is incorrect C code (it may not crash, but you never know)
char *bar
allocates 1 pointers worth of memory on the stack.
Personally, I would do it like this.
1.
char *get_foo1(void) {
char *bar;
bar = malloc(strlen("bar")+1);
sprintf(bar,"bar");
return bar;
}
2. Or pass your allocated variable in.
void get_foo2(char **bar) {
sprintf(*bar,"bar);
}
combine 1 and 2, give user options
When working with strings in C, you almost always need to malloc() memory for usage.
Unless the length of the strings are known ahead of time or are very small. Additionally, you can use #2 above to avoid memory allocation, like this
int main(int argc, char *argv[]) {
char bar[4];
get_foo2(&bar);
}

Resources