Is this good practice? The code compiles and runs but I wonder if this is a good practice to emulate
In C code,
we write const char *str1 = "abc";
then later, lets say there is a pointer variable char *str2 that points to dynamically allocated memory
and then we do str1 = str2 so now both str1 and str2 point to dynamically allocated memory
So now we have lost track of any pointer to "abc". Though in this code, we may not need it but I wonder what is the best recommended way to handle these.
The overall problem is that we need a string that initially is declared to abc and later dependent on user input, we may want to use the string supplied by the user.
It's absolutely fine. const char *str1 means "a pointer that can be modified, to character data that cannot be modified (through this pointer)".
So, you can point str1 at any string you like, and it makes sense to "reseat" it to point at different strings at different times.
Obviously if your code is complicated enough, you can make it difficult for a reader to work out what the variable currently contains, but that's true of all variables. For example you want to be careful with pointers that sometimes point at string literals and sometimes point at dynamically-allocated memory, because it might not always be clear whether the pointer should be freeed.
If you wanted str1 to always point at the same string, you would define it const char * const str1 (or char const *const str1 in order to make the position of the const always consistent). That's not what you want in this case, and the fact that you haven't declared str1 const indicates as much to the reader.
Losing the pointer to the string literal will not lead to memory leak, so what you do is safe in that aspect.
The string literal "abc" is not dynamically allocated, so there is nothing that can leak in this situation.
String literals are a part of the "program image" that gets loaded into memory at startup by the executable loader of the operating system. The space that this image occupies is reclaimed by the operating system once the process is over. Of course this is not quite accurate since there are techniques like demand paging and copy-on-write, but they are irrelevant for that case.
It would be a problem if you didn't put const in that definition. The latter would allow you to attempt to modify a piece of memory which is usually stored in a read-only area of the process, so undefined behaviour would manifest.
Related
I have a string, that is only used once when my application launches. Ordinary string literals, eg. "Hello" are static, meaning they're only deallocated when the program ends. I don't want that. They can be deallocated earlier. How do I say, Hey, like, this string literal shouldn't be static. It should be deallocated when the scope ends. How do I do that? For example,
memcpy(GameDir+HomeDirLen, "/.Data", 7);
The "/.Data" is still stored in ram as the literal even long after the this line of code runs. That's a waste, because it's only used once.
With typical implementations, if your program contains the string "/.Data" anywhere, either as a literal or as an initializer for an array of any duration, then the program is going to contain those bytes somewhere in the executable. They'll be loaded (or mapped) into memory when the program loads, and I don't know of any implementation that can free such memory before the program exits. So the other answers so far don't really accomplish what you want.
(If your array was of auto duration, then initializing is typically done under the hood by copying from an anonymous static string. Or it could be done by a sequence of immediate store instructions, which probably uses even more memory.)
So if you really want to ensure that those bytes don't occupy memory for the life of the process, you'll have to get them from somewhere other than the program itself. For instance, you could store the string in a file, open it, and read the string into an auto or malloced array. Then you really will recover the memory when the array goes out of scope or is freed (assuming, of course, that free actually does recover memory in a way that's useful to you). You could also use mmap if your system provides it.
On the other hand, modern operating systems usually have virtual memory. So if your string literal is in the read-only data section of the program, then if physical memory becomes tight, the system can simply drop that page of physical memory and use it for something else. If your program should attempt to access that data again, the system will allocate a new page and transparently populate it from the executable file on disk - but if you never access it, that will never happen.
Of course this doesn't help much if your string is really only 7 bytes, because there will be lots of other stuff in that page of memory (a page is commonly 4KB or somewhere around there). But if your string is really big, or you have a lot of such strings, then this effect may work just as well as actually freeing the memory. You may even be able to use various compiler-specific options to ensure that all your only-needed-once strings are placed contiguously in the executable, so that they will all be in the same pages of memory.
I have a string, that is only used once when my application launches.
Ordinary string literals, eg. "Hello" are static, meaning they're only
deallocated when the program ends. I don't want that. They can be
deallocated earlier. How do I say, Hey, like, this string literal
shouldn't be static. It should be deallocated when the scope ends. How
do I do that?
You cannot. All string literals have static storage duration, and that's really the only way they could work. If you have a string literal in your program source that is used in any way, then the program image has to contain the bytes of the literal's representation somewhere among the program data. If the literal appears inside a function, as must be the case in your example, then the representation needs to be retained for use each time the function is called. Similar applies to uses at file scope: string literals used there typically are accessible for the entire run of the program.
The exception is string literals used as initializers for (other) character arrays with static storage duration. Such an initialization results in, initially, two identical copies of the same data, at most one of which is actually accessible at run time. There's no use for retaining the data for the literal separately. C does not specify a way for you to express that the literal should not be retained, but your compiler is at liberty to omit the unneeded duplicate at its own discretion, and at least some do.
Compilers may also fold identical string literals, and perhaps even fold literals that just have identical tails, and / or perform other space-saving optimizations. And your compiler is likely to be better than you are at recognizing when and how such optimizations can safely be performed.
This answer does not really match your specific need. I'll leave it for the comments.
You can use a compound literal ... and a pointer to it
char *p = (char[]){"Hello!"}; // vs char *p = "Hello!"
*p = 'C'; // *p = 'C'; // illegal
see https://ideone.com/62UNKO
To create a string that I can modify, I can do something like this:
// Creates a variable string via array
char string2[] = "Hello";
string2[0] = 'a'; // this is ok
And to create a constant string that cannot be modified:
// Creates a constant string via a pointer
char *string1 = "Hello";
string1[0] = 'a'; // This will give a bus error
My question then is how would one modify a constant string (for example, by casting)? And, is that considered bad practice, or is it something that is commonly done in C programming?
By definition, you cannot modify a constant. If you want to get the same effect, make a non-constant copy of the constant and modify that.
how would one modify a constant string (for example, by casting)?
If by this you mean, how would one attempt to modify it, you don't even need a cast. Your sample code was:
char *string1 = "Hello";
string1[0] = 'a'; // This will give a bus error
If I compile and run it, I get a bus error, as expected, and just like you did. But if I compile with -fwritable-strings, which causes the compiler to put string constants in read/write memory, it works just fine.
I suspect you were thinking of a slightly different case. If you write
const char *string1 = "Hello";
string1[0] = 'a'; // This will give a compilation error
the situation changes: you can't even compile the code. You don't get a Bus Error at run-time, you get a fatal error along the lines of "read-only variable is not assignable" at compile time.
Having written the code this way, one can attempt to get around the const-ness with an explicit cast:
((char *)string1)[0] = 'a';
Now the code compiles, and we're back to getting a Bus Error. (Or, with -fwritable-strings, it works again.)
is that considered bad practice, or is it something that is commonly done in C programming
I would say it is considered bad practice, and it is not something that is commonly done.
I'm still not sure quite what you're asking, though, or if I've answered your question. There's often confusion in this area, because there are typically two different kinds of "constness" that we're worried about:
whether an object is stored in read-only memory
whether a variable is not supposed to be modified, due to the constraints of a program's architecture
The first of these is enforced by the OS and by the MMU hardware. It doesn't matter what programming-language constructs you did or didn't use -- if you attempt to write to a readonly location, it's going to fail.
The second of these has everything to do with software engineering and programming style. If a piece of code promises not to modify something, that promise may let you make useful guarantees about the rest of the program. For example, the strlen function promises not to modify the string you hand it; all it does is inspect the string in order to compute its length.
Confusingly, in C at least, the const keyword has mostly to do with the second category. When you declare something as const, it doesn't necessarily (and in fact generally does not) cause the compiler to put the something into read-only memory. All it does is let the compiler give you warnings and errors if you break your promise -- if you accidentally attempt to modify something that elsewhere you declared as const. (And because it's a compile-time thing, you can also readily "cheat" and turn off this kind of constness with a cast.)
But there is read-only memory, and these days, compilers typically do put string constants there, even though (equally confusingly, but for historical reasons) string constants do not have the type const char [] in C. But since read-only memory is a hardware thing, you can't "turn it off" with a cast.
You cannot modify the contents of a string literal in a safe or reliable manner in C; it results in undefined behavior. From the C11 standard draft section 6.4.5 p7 concerning string literals:
It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
Attempting to modify constant string literal is undefined behavior. You may get a bus error, as in your case, or the program may not even indicate that the write failed at all. This is undefined behavior for you - the language makes no promises at this point.
You could reassign the pointer (losing your reference to the string "Hello"):
char *s1 = "Hello";
printf("%s ", s1);
s1 = "World";
printf("%s\n", s1);
Is there a way (in pure C) to distinguish a malloced string from a string literal, without knowing which is which? Strictly speaking, I'm trying to find a way to check a variable whether it's a malloced string or not and if it is, I'm gonna free it; if not, I'll let it go.
Of course, I can dig the code backwards and make sure if the variable is malloced or not, but just in case if an easy way exists...
edit: lines added to make the question more specific.
char *s1 = "1234567890"; // string literal
char *s2 = strdup("1234567890"); // malloced string
char *s3;
...
if (someVar > someVal) {
s3 = s1;
} else {
s3 = s2;
}
// ...
// after many, many lines of code an algorithmic branches...
// now I lost track of s3: is it assigned to s1 or s2?
// if it was assigned to s2, it needs to be freed;
// if not, freeing a string literal will pop an error
Is there a way (in pure C) to distinguish a malloced string from a string literal,
Not in any portable way, no. No need to worry though; there are better alternatives.
When you write code in C you do so while making strong guarantees about "who" owns memory. Does the caller own it? Then it's their responsibility to deallocate it. Does the callee own it? Similar thing.
You write code which is very clear about the chain of custody and ownership and you don't run into problems like "who deallocates this?" You shouldn't feel the need to say:
// after many, many lines of code an algorithmic branches...
// now I forgot about s3: was it assigned to s1 or s2?
The solution is; don't forget! You're in control of your code, just look up the page a bit. Design it to be bulletproof against leaking memory out to other functions without a clear understanding that "hey, you can read this thing, but there are no guarantees that it will be valid after X or Y. It's not your memory, treat it as such."
Or maybe it is your memory. Case in point; your call to strdup. strdup let's you know (via documentation) that it is your responsibility to deallocate the string that it returns to you. How you do that is up to you, but your best bet is to limit its scope to be as narrow as possible and to only keep it around for as short a time as necessary.
It takes time and practice for this to become second nature. You will create some projects which handle memory poorly before you get good at it. That's ok; your mistakes in the beginning will teach you exactly what not to do, and you'll avoid repeating them in the future (hopefully!)
Also, as #Lasse alluded to in the comments, you don't have to worry about s3, which is a copy of a pointer, not the entire chunk of memory. If you call free on s2 and s3 you end up with undefined behavior.
Here is a practical way:
Although the C-language standard does not dictate this, for all identical occurrences of a given literal string in your code, the compiler generates a single copy within the RO-data section of the executable image.
In other words, every occurrence of the string "1234567890" in your code is translated into the same memory address.
So at the point where you want to deallocate that string, you can simply compare its address with the address of the literal string "1234567890":
if (s3 != "1234567890") // address comparison
free(s3);
To emphasize this again, it is not imposed by the C-language standard, but it is practically implemented by any decent compiler of the language.
UPDATE:
The above is merely a practical trick referring directly to the question at hand (rather than to the motivation behind this question).
Strictly speaking, if you've reached a point in your implementation where you have to distinguish between a statically allocated string and a dynamically allocated string, then I would tend to guess that your initial design is flawed somewhere along the line.
I have defined a function which has a pointer to char as its argument. When passing a constant string to this function's pointer, does the string reserves a dedicated space in memory or can any other object override it without using a pointer to access this specified space?
Defining my function
void myFunction(char *p){
// some instructions
}
Passing a constant string to this function
myFunction("Some Text");
Really you should write void myFunction(const char *p){ becuase, whatever you do, don't attempt to modify the string pointed at by p. To do so is undefined behaviour.
This is because the string "Some Text" will be copied into read-only memory on program compilation and the pointer to that will be passed to p.
That call will pass address of a string literal to the function, and that address is fixed at compile time (or time of dynamic linking when program is started, but from the point of view of this question, it's the same). It is operating system dependent what happens, if something tries to modify the contents of that memory address. In many (most? all?) modern PC operating systems, the string literal will be in a read-only memory area, and if the application tries to modify it, there will be segmentation fault. In other (older, embedded, etc) operating systems or CPU architectures, there might not even be a way to make some memory read-only, and then the string literals could be modified, but it might still cause unexpected results (see next paragraphs). Modifying string literals is Undefined Behaviour in C standard.
Wether that memory is "dedicated" is matter of interpretation. If you have two string literals like "foobar" and "bar", then compiler is allowed to make them overlap, so "foobar" literal points to that 'f', while "bar" literal points to 'b' in the same memory space. Unless you are doing some very "clever" pointer comparisons, this should make no difference because both are in read-only memory.
There are some ways where the string could get modified: It would be possible to write code running with kernel privileges, which could modify the contents of that "read only" memory while program is running. It would also be possible to alter compiler so that string literals are not in read-only memory, and then they could be modified without program crashing. And for non-runtime permanent change, it would be possible to simply edit the program binary with hex editor and change the string. But none of this are a concern when doing regular application development.
Generally speaking, if you are passing a pointer to function in C (or C++, where this applies to references too), always make it a pointer to const (or const reference in C++), like const char* in this case. The caller might have a const object, and you don't want to prevent them from calling the function with that. And if the function actually is going to modify the argument, then you want the compiler to stop the caller from passing a const object to it.
Yes, It does reserve a dedicated space in memory.
No other object can override it without using a pointer to access this specified space.
No, it doesn't allocate memory for the string, the pointer to the start of the string is pushed into the stack, if you try to modify the string, you'll segfault, you can however modify the pointer.
While I was working on an assignment, I came to know that we should not use assignments such as :
char *s="HELLO WORLD";
Programs using such syntaxes are prone towards crashing.
I tried and used:
int fun(char *temp)
{
// do sum operation on temp
// print temp.
}
fun("HELLO WORLD");
Even the above works(though the output is compiler and standard specific).
Instead we should try strdup() or use const char *
I have tried reading other similar questions on the blog, but could not get the concept that WHY THE ABOVE CODE SHOULDNT WORK.
Is memory allocated?? And what difference does const make??
Lets clarify things a bit. You don't ever specifically need strdup. It is just a function that allocates a copy of a char* on the heap. It can be done many different ways including with stack based buffers. What you need is the result, a mutable copy of a char*.
The reason the code you've listed is dangerous is that it's passing what is really a constant string in the from of a string literal into a slot which expects a mutable string. This is unfortunately allowed in the C standard but is ihnherently dangerous. Writing to a constant string will produce unexpected results and often crashes. The strdup function fixes the problem because it creates a mutable copy which is placed into a slot expecting a mutable string.
String literals are stored in the program's data segment. Manipulating their pointers will modify the string literal, which can lead to... strange results at best. Use strdup() to copy them to heap- or stack-allocated space instead.
String literals may be stored in portions of memory that do not have write privileges. Attempting to write to them will cause undefined behaviour. const means that the compiler ensures that the pointer is not written to, guaranteeing that you do not invoke undefined behaviour in this way.
This is a problem in C. Although string literals are char * you can't modify them, so they are effectively const char*.
If you are using gcc, you can use -Wwrite-strings to check if you are using string literals correctly.
Read my answer on (array & string) Difference between Java and C. It contains the answer to your question in the section about strings.
You need to understand that there's a difference between static and memory allocation and that you don't resort to the same memory spaces.