Why do we need strdup()? - c

While I was working on an assignment, I came to know that we should not use assignments such as :
char *s="HELLO WORLD";
Programs using such syntaxes are prone towards crashing.
I tried and used:
int fun(char *temp)
{
// do sum operation on temp
// print temp.
}
fun("HELLO WORLD");
Even the above works(though the output is compiler and standard specific).
Instead we should try strdup() or use const char *
I have tried reading other similar questions on the blog, but could not get the concept that WHY THE ABOVE CODE SHOULDNT WORK.
Is memory allocated?? And what difference does const make??

Lets clarify things a bit. You don't ever specifically need strdup. It is just a function that allocates a copy of a char* on the heap. It can be done many different ways including with stack based buffers. What you need is the result, a mutable copy of a char*.
The reason the code you've listed is dangerous is that it's passing what is really a constant string in the from of a string literal into a slot which expects a mutable string. This is unfortunately allowed in the C standard but is ihnherently dangerous. Writing to a constant string will produce unexpected results and often crashes. The strdup function fixes the problem because it creates a mutable copy which is placed into a slot expecting a mutable string.

String literals are stored in the program's data segment. Manipulating their pointers will modify the string literal, which can lead to... strange results at best. Use strdup() to copy them to heap- or stack-allocated space instead.

String literals may be stored in portions of memory that do not have write privileges. Attempting to write to them will cause undefined behaviour. const means that the compiler ensures that the pointer is not written to, guaranteeing that you do not invoke undefined behaviour in this way.

This is a problem in C. Although string literals are char * you can't modify them, so they are effectively const char*.
If you are using gcc, you can use -Wwrite-strings to check if you are using string literals correctly.

Read my answer on (array & string) Difference between Java and C. It contains the answer to your question in the section about strings.
You need to understand that there's a difference between static and memory allocation and that you don't resort to the same memory spaces.

Related

How would one modify a constant string?

To create a string that I can modify, I can do something like this:
// Creates a variable string via array
char string2[] = "Hello";
string2[0] = 'a'; // this is ok
And to create a constant string that cannot be modified:
// Creates a constant string via a pointer
char *string1 = "Hello";
string1[0] = 'a'; // This will give a bus error
My question then is how would one modify a constant string (for example, by casting)? And, is that considered bad practice, or is it something that is commonly done in C programming?
By definition, you cannot modify a constant. If you want to get the same effect, make a non-constant copy of the constant and modify that.
how would one modify a constant string (for example, by casting)?
If by this you mean, how would one attempt to modify it, you don't even need a cast. Your sample code was:
char *string1 = "Hello";
string1[0] = 'a'; // This will give a bus error
If I compile and run it, I get a bus error, as expected, and just like you did. But if I compile with -fwritable-strings, which causes the compiler to put string constants in read/write memory, it works just fine.
I suspect you were thinking of a slightly different case. If you write
const char *string1 = "Hello";
string1[0] = 'a'; // This will give a compilation error
the situation changes: you can't even compile the code. You don't get a Bus Error at run-time, you get a fatal error along the lines of "read-only variable is not assignable" at compile time.
Having written the code this way, one can attempt to get around the const-ness with an explicit cast:
((char *)string1)[0] = 'a';
Now the code compiles, and we're back to getting a Bus Error. (Or, with -fwritable-strings, it works again.)
is that considered bad practice, or is it something that is commonly done in C programming
I would say it is considered bad practice, and it is not something that is commonly done.
I'm still not sure quite what you're asking, though, or if I've answered your question. There's often confusion in this area, because there are typically two different kinds of "constness" that we're worried about:
whether an object is stored in read-only memory
whether a variable is not supposed to be modified, due to the constraints of a program's architecture
The first of these is enforced by the OS and by the MMU hardware. It doesn't matter what programming-language constructs you did or didn't use -- if you attempt to write to a readonly location, it's going to fail.
The second of these has everything to do with software engineering and programming style. If a piece of code promises not to modify something, that promise may let you make useful guarantees about the rest of the program. For example, the strlen function promises not to modify the string you hand it; all it does is inspect the string in order to compute its length.
Confusingly, in C at least, the const keyword has mostly to do with the second category. When you declare something as const, it doesn't necessarily (and in fact generally does not) cause the compiler to put the something into read-only memory. All it does is let the compiler give you warnings and errors if you break your promise -- if you accidentally attempt to modify something that elsewhere you declared as const. (And because it's a compile-time thing, you can also readily "cheat" and turn off this kind of constness with a cast.)
But there is read-only memory, and these days, compilers typically do put string constants there, even though (equally confusingly, but for historical reasons) string constants do not have the type const char [] in C. But since read-only memory is a hardware thing, you can't "turn it off" with a cast.
You cannot modify the contents of a string literal in a safe or reliable manner in C; it results in undefined behavior. From the C11 standard draft section 6.4.5 p7 concerning string literals:
It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
Attempting to modify constant string literal is undefined behavior. You may get a bus error, as in your case, or the program may not even indicate that the write failed at all. This is undefined behavior for you - the language makes no promises at this point.
You could reassign the pointer (losing your reference to the string "Hello"):
char *s1 = "Hello";
printf("%s ", s1);
s1 = "World";
printf("%s\n", s1);

Using own version of memset with char *s = "Hello" vs char s[] = "Hello". Please clarify why one doesn't set memory but will print array? [duplicate]

This question already has answers here:
What is the difference between char s[] and char *s?
(14 answers)
Closed 4 years ago.
I would like to understand the difference between the format of having a pointer to an array of characters declared and initialised to an array of choice example char *s = "Hello"; and having it done the more common way of declaring an array char s[] = "Hello";
I recreated memset and tried using it on the first declaration example which resulted in a bus error. I don't understand why because from what I gather is that in C, the first method of declaring and initialising an array should be the same as the second one in terms of how they are both stored on the stack memory and a pointer being able to interact with the arrays address when the array is passed by reference to my function.
The problem is that I can not alter the first version of the array but I can print it using printf which suggests that altering it should therefore be possible because it IS there somewhere in memory and I have access to its address.
I would just like to have a clear explanation as to what is going on under the hood for both formats. What makes the one achieve the objective yet the other doesn't.
My aim is to just understand as much as I can about C, your help will be appreciated in getting me to soon appreciate the beauty of the language and having a clear picture of how all of its components work together to efficiently solve a problem.
Attached below is my code:
Format that works with my ft_memset.c funct
Format that doesn't work, yet able to print the array
In char *s = “Hello", ”Hello" is a string literal that causes an array of char to be created and set to contain “Hello” and a terminating null character. Because it is a string literal, the rules of C say you should not try to write to it.
The statement also defines s to be a pointer to char and initializes s to point to the first character.
The statement char s[] = “Hello” defines s to be an array of char and initializes it to contain “Hello” and a terminating null character. Because this is an ordinary array, not a string literal, you are allowed to write to it. (In this statement, ”Hello” is technically a string literal, but it is used only to initialize the array s, as if by copying its contents into s. s is not the string literal and does not point to the string literal; it is a separate array.)
The compiler is allowed to put the string literal anywhere it likes. It does not have to be on the stack. Because you are getting an error when you try to write to it, it seems the compiler is putting it in the read-only constants section of the program.
When you declare s to be an array, compilers will typically put the array on the stack if s has automatic storage duration (is declared inside a function without static, extern, or a thread-storage qualifier). (Using the stack is not a requirement of C. Some compilers may put the array elsewhere, but this generally happens only in special circumstances, such as when compiling for a very limited environment.)
Trying to modify string literals are undefined behavior. §6.4.5¶7(which is what you did in first case).
There lies a question whether some implementation would allow you to make changes to it or not. c standard don't impose anything on that line. It just says that modifying a string literal is undefined behavior.
Going from what I said, in most implementations string literals are put in read only section making them non-modifiable. On your case you saw the problem - because you were trying to modify it and that resulted in BUS error. (In your case you saw this).
First one is just a pointer pointing to the string literal - so everything said holds for it.
Second one is creating an array and initializing it with the content of the string literal and this is modifiable. You are safe to do whatever you want with this. §6.7.9¶14 Based on where you put it may have static storage duration or automatic storage duration. (static storage duration - when placed in file scope, automatic when placed in block scope).
From the very standard quoted earlier - it says from §6.4.5¶6 (Regarding the storage of string literals)
The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence.
The links are from C11 standard N1570

How to distinguish a malloced string from a string literal?

Is there a way (in pure C) to distinguish a malloced string from a string literal, without knowing which is which? Strictly speaking, I'm trying to find a way to check a variable whether it's a malloced string or not and if it is, I'm gonna free it; if not, I'll let it go.
Of course, I can dig the code backwards and make sure if the variable is malloced or not, but just in case if an easy way exists...
edit: lines added to make the question more specific.
char *s1 = "1234567890"; // string literal
char *s2 = strdup("1234567890"); // malloced string
char *s3;
...
if (someVar > someVal) {
s3 = s1;
} else {
s3 = s2;
}
// ...
// after many, many lines of code an algorithmic branches...
// now I lost track of s3: is it assigned to s1 or s2?
// if it was assigned to s2, it needs to be freed;
// if not, freeing a string literal will pop an error
Is there a way (in pure C) to distinguish a malloced string from a string literal,
Not in any portable way, no. No need to worry though; there are better alternatives.
When you write code in C you do so while making strong guarantees about "who" owns memory. Does the caller own it? Then it's their responsibility to deallocate it. Does the callee own it? Similar thing.
You write code which is very clear about the chain of custody and ownership and you don't run into problems like "who deallocates this?" You shouldn't feel the need to say:
// after many, many lines of code an algorithmic branches...
// now I forgot about s3: was it assigned to s1 or s2?
The solution is; don't forget! You're in control of your code, just look up the page a bit. Design it to be bulletproof against leaking memory out to other functions without a clear understanding that "hey, you can read this thing, but there are no guarantees that it will be valid after X or Y. It's not your memory, treat it as such."
Or maybe it is your memory. Case in point; your call to strdup. strdup let's you know (via documentation) that it is your responsibility to deallocate the string that it returns to you. How you do that is up to you, but your best bet is to limit its scope to be as narrow as possible and to only keep it around for as short a time as necessary.
It takes time and practice for this to become second nature. You will create some projects which handle memory poorly before you get good at it. That's ok; your mistakes in the beginning will teach you exactly what not to do, and you'll avoid repeating them in the future (hopefully!)
Also, as #Lasse alluded to in the comments, you don't have to worry about s3, which is a copy of a pointer, not the entire chunk of memory. If you call free on s2 and s3 you end up with undefined behavior.
Here is a practical way:
Although the C-language standard does not dictate this, for all identical occurrences of a given literal string in your code, the compiler generates a single copy within the RO-data section of the executable image.
In other words, every occurrence of the string "1234567890" in your code is translated into the same memory address.
So at the point where you want to deallocate that string, you can simply compare its address with the address of the literal string "1234567890":
if (s3 != "1234567890") // address comparison
free(s3);
To emphasize this again, it is not imposed by the C-language standard, but it is practically implemented by any decent compiler of the language.
UPDATE:
The above is merely a practical trick referring directly to the question at hand (rather than to the motivation behind this question).
Strictly speaking, if you've reached a point in your implementation where you have to distinguish between a statically allocated string and a dynamically allocated string, then I would tend to guess that your initial design is flawed somewhere along the line.

When assigning a string to a pointer does it reserve space memory

I have defined a function which has a pointer to char as its argument. When passing a constant string to this function's pointer, does the string reserves a dedicated space in memory or can any other object override it without using a pointer to access this specified space?
Defining my function
void myFunction(char *p){
// some instructions
}
Passing a constant string to this function
myFunction("Some Text");
Really you should write void myFunction(const char *p){ becuase, whatever you do, don't attempt to modify the string pointed at by p. To do so is undefined behaviour.
This is because the string "Some Text" will be copied into read-only memory on program compilation and the pointer to that will be passed to p.
That call will pass address of a string literal to the function, and that address is fixed at compile time (or time of dynamic linking when program is started, but from the point of view of this question, it's the same). It is operating system dependent what happens, if something tries to modify the contents of that memory address. In many (most? all?) modern PC operating systems, the string literal will be in a read-only memory area, and if the application tries to modify it, there will be segmentation fault. In other (older, embedded, etc) operating systems or CPU architectures, there might not even be a way to make some memory read-only, and then the string literals could be modified, but it might still cause unexpected results (see next paragraphs). Modifying string literals is Undefined Behaviour in C standard.
Wether that memory is "dedicated" is matter of interpretation. If you have two string literals like "foobar" and "bar", then compiler is allowed to make them overlap, so "foobar" literal points to that 'f', while "bar" literal points to 'b' in the same memory space. Unless you are doing some very "clever" pointer comparisons, this should make no difference because both are in read-only memory.
There are some ways where the string could get modified: It would be possible to write code running with kernel privileges, which could modify the contents of that "read only" memory while program is running. It would also be possible to alter compiler so that string literals are not in read-only memory, and then they could be modified without program crashing. And for non-runtime permanent change, it would be possible to simply edit the program binary with hex editor and change the string. But none of this are a concern when doing regular application development.
Generally speaking, if you are passing a pointer to function in C (or C++, where this applies to references too), always make it a pointer to const (or const reference in C++), like const char* in this case. The caller might have a const object, and you don't want to prevent them from calling the function with that. And if the function actually is going to modify the argument, then you want the compiler to stop the caller from passing a const object to it.
Yes, It does reserve a dedicated space in memory.
No other object can override it without using a pointer to access this specified space.
No, it doesn't allocate memory for the string, the pointer to the start of the string is pushed into the stack, if you try to modify the string, you'll segfault, you can however modify the pointer.

changing a const char *str1 = "abc"

Is this good practice? The code compiles and runs but I wonder if this is a good practice to emulate
In C code,
we write const char *str1 = "abc";
then later, lets say there is a pointer variable char *str2 that points to dynamically allocated memory
and then we do str1 = str2 so now both str1 and str2 point to dynamically allocated memory
So now we have lost track of any pointer to "abc". Though in this code, we may not need it but I wonder what is the best recommended way to handle these.
The overall problem is that we need a string that initially is declared to abc and later dependent on user input, we may want to use the string supplied by the user.
It's absolutely fine. const char *str1 means "a pointer that can be modified, to character data that cannot be modified (through this pointer)".
So, you can point str1 at any string you like, and it makes sense to "reseat" it to point at different strings at different times.
Obviously if your code is complicated enough, you can make it difficult for a reader to work out what the variable currently contains, but that's true of all variables. For example you want to be careful with pointers that sometimes point at string literals and sometimes point at dynamically-allocated memory, because it might not always be clear whether the pointer should be freeed.
If you wanted str1 to always point at the same string, you would define it const char * const str1 (or char const *const str1 in order to make the position of the const always consistent). That's not what you want in this case, and the fact that you haven't declared str1 const indicates as much to the reader.
Losing the pointer to the string literal will not lead to memory leak, so what you do is safe in that aspect.
The string literal "abc" is not dynamically allocated, so there is nothing that can leak in this situation.
String literals are a part of the "program image" that gets loaded into memory at startup by the executable loader of the operating system. The space that this image occupies is reclaimed by the operating system once the process is over. Of course this is not quite accurate since there are techniques like demand paging and copy-on-write, but they are irrelevant for that case.
It would be a problem if you didn't put const in that definition. The latter would allow you to attempt to modify a piece of memory which is usually stored in a read-only area of the process, so undefined behaviour would manifest.

Resources