Replacing part of string with another string causes segfault - c

I want to do something simple but I've been banging my head on this for too long. I have a string that will always end with a specific "token". In the case below "++". I want to replace the ending token with a shorter token, let's say "#".
strstr returns either NULL or a pointer to the initial ending token.
strcpy should take the pointer returned from strstr and overwrite the "++" with "#\0".
At least that's what I think it should do. Instead, I get an "assignment makes integer from pointer without a cast" warning. I have tried strcpy(*newThing, "#"); and a few other things at this point. the things that don't give errors cause seg-faults.
Is it that in this case C is taking the storage for the original string from some immutable space on the stack? Do I need to use one of the "alloc"s? I've tried that too but might have missed something.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main() {
char *thing="ssssssssssssssss++";
char *newThing;
newThing = strstr(thing, "++");
strcpy(newThing, "#");
printf("%s\n", thing);
exit(0);
}

Your problem is that thing is a pointer that points to a string literal. In the old days, you could have done that, but current compilers stores string literal in non writeable memory.
So when you execute you code you get a Access violation writing location (or equivalent message).
Simplest way to fix that : declare an automatic array initialized with the string literal, and all is fine :
char thing[]="ssssssssssssssss++";
(an array and a pointer are not exactly the same ...)

The problem is that thing is a string literal. You could use this:
char thing[]="ssssssssssssssss++";
which will allocate as mush space needed (the complier will automatically figure out the size of the array).
I googled the reason the string literal will fail. I found an example of mine that I had forgotten of. Here it is.
It pretty much boils down to this:
String literals can not modify their data, but they can modify their pointer. On the other hand, if you had an array, then the opposite stands true. You can not modify the pointer, but you can modify the data.
An interesting comment on this was: “String literals have a type. It’s array of char. And while you said “String literals can not modify their data, but they can modify their pointer,” my main problem with this is that they don’t have a pointer like you implied. The pointer and the string literal are actually separate things.”

Related

Is String Literal in C really not modifiable?

As far as I know, a string literal can't be modified for example:
char* a = "abc";
a[0] = 'c';
That would not work since string literal is read-only. I can only modify it if:
char a[] = "abc";
a[0] = 'c';
However, in this post,
Parse $PATH variable and save the directory names into an array of strings, the first answer modified a string literal at these two places:
path_var[j]='\0';
array[current_colon] = path_var+j+1;
I'm not very familiar with C so any explanation would be appreciated.
In programming, there are quite a few rules that are up to you to follow, even though they are not — necessarily — enforced. And "String literals in C are not modifiable" is one of those. So is "Strings returned by getenv should not be modified".
There are some real-world analogies that apply. Here's one: If you're at an intersection, and the light is red, you're not supposed to cross. But, much of the time, if you break the rule, and cross, you might get away with it. You might get a ticket from a policeman — or you might not. You might cause a crash — or you might not. But if you get lucky, and neither of these things happens, that does not imply that crossing the intersection against the red light was okay — it's still quite true that it was very much against the rules.
Similarly, in C, if you write some code that modifies a string literal, or a string returned from getenv, you might get away with it. The compiler might give you a warning or error message — or it might not. Your program might crash — or it might not. But if the program seems to work, that does not imply that these strings are actually modifiable — they're not.
Code blocks from the post you linked:
const char *orig_path_var = getenv("PATH");
char *path_var = strdup(orig_path_var ? orig_path_var : "");
const char **array;
array = malloc((nb_colons+1) * sizeof(*array));
array[0] = path_var;
array[current_colon] = path_var+j+1;
First block:
In the 1st line getenv() returns a pointer to a string which is pointed to by orig_path_var. The string that get_env() returns should be treated as a read-only string as the behaviour is undefined if the program attempts to modify it.
In the 2nd line strdup() is called to make a duplicate of this string. The way strdup() does this is by calling malloc() and allocating memory for the size of the string + 1 and then copying the string into the memory.
Since malloc() is used, the string is stored on the heap, this allows us to edit the string and modify it.
Second block:
In the 1st line we can see that array points to a an array of char * pointers. There is nb_colons+1 pointers in the array.
Then in the 2nd line the 0th element of array is initilized to path_var (remember it is not a string literal, but a copy of one).
In the 3rd line, the current_colonth element of array is set to path_var+j+1. If you don't understand pointer arithmetic, this just means it assigns the address of the j+1th char of path_var to array[current_colon].
As you can see, the code is not operating on const string literals like orig_path_var. Instead it uses a copy made with strdup(). This seems to be where your confusion stems from so take a look at this:
char *strdup(const char *s);
The strdup() function returns a pointer to a new string which is a duplicate of the string s. Memory for the new string is obtained with malloc(3), and can be freed with free(3).
The above text shows what strdup() does according to its man page.
It may also help to read the malloc() man page.
In the example
char* a = "abc";
the token "abc" produces a literal object in the program image, and denotes an expression which yields that object's address.
In the example
char a[] = "abc";
The token "abc" is serves as an array initializer, and doesn't denote a literal object. It is equivalent to:
char a[] = { 'a', 'b', 'c', 0 };
The individual character values of "abc" are literal data is recorded somewhere and somehow in the program image, but they are not accessible as a string literal object.
The array a isn't a literal, needless to say. Modifying a doesn't constitute modifying a literal, because it isn't one.
Regarding the remark:
That would not work since string literal is read-only.
That isn't accurate. The ISO C standard (no version of it to date) doesn't specify any requirements for what happens if a program tries to modify a string literal. It is undefined behavior. If your implementation stops the program with some diagnostic message, that's because of undefined behavior, not because it is required.
C implementations are not required to support string literal modification, which has the benefits like:
standard-conforming C programs can be translated into images that can be be burned into ROM chips, such that their string literals are accessed directly from that ROM image without having to be copied into RAM on start-up.
compilers can condense the storage for string literals by taking advantage of situations when one literal is a suffix of another. The expression "string" + 2 == "ring" can yield true. Since a strictly conforming program will not do something like "ring"[0] = 'w', due to that being undefined behavior, such a program will thereby avoid falling victim to the surprise of "string" unexpectedly turning into "stwing".
There are several reasons for which you had better not to modify them:
The first is that the operating system and/or the compiler can enforce the non-writable property of string literals, putting them in read-only memory (e.g. ROM) or in the .text segment.
second, the compiler is allowed to merge string literals together, so if you modify (and do it successfully) you can get surprises later because other literals (that have been merged because e.g. one of them is a suffix of the other) change apparently by no reason.
if you need an initialized string that is modifiable, you can do it by allocating an array with a declaration, as in (which you can freely modify):
char array[100] = "abc"; // initialized to { 'a' ,'b', 'c', '\0',
// /* and 96 more '\0' characters */
// };

Modifying string in function C

Let's say I want to modify char array using function.
I am always seeing people using malloc, calloc, or pointers to modify int, char, or 2D arrays.
Am I right, if I say, that string can be returned from function only if I use malloc, create that array pointer and return him? Then why not getting/altering string, by passing it to function parameter?
Isn't my demonstration, which is using char array in parameter easier, than allocating/freeing? Is my concept wrong, or why am I never seeing people passing arrays to function? I am only seeing codes with passing like "char *array", not "char array[]", using malloc etc, when I see this method of altering char array easy. Am I missing something?
#include <stdio.h>
void change(char array[]){
array[0]='K';
}
int main(){
char array[]="HEY";
printf("%s\n", array);
change(array);
printf("%s\n",array );
return 0;
}
If you only need to change existing characters in the string, and the string will be in a variable, and you don't mind the side-effect of your original string being modified, then your solution may be acceptable and indeed easier. But:
What if you want to get a modified string, but also want to retain the original? To avoid destroying an arbitrary-sized original, you need to malloc space, make a copy, and modify that.
And what if you want to extend the string? If your change is to add " YOU" to the string, it can't modify the original because there's no space for it--it'll cause a buffer overflow, since there's only 4 bytes allocated for "HEY" (three letters plus the null terminator). Again, the solution involves mallocing space to work with.
Functions that make changes using your technique typically need a size or length parameter to avoid overflowing the array and causing a crash and a potential security risk. But although that avoids the overflow, there's still the question of what happens if there's not enough space: Silently drop some data? Pass back a flag or special value to indicate there wasn't enough space, and expect the caller to handle it? In the long run, it ends up easier to write it right the first time, and malloc/calloc the space and deal with having to free it up later and all that.

how do you change a single char in a 2D array (C)

How can I change a single character in a 2D array in C? I have tried but can't get it to compile...
char *words[50][20];
words[0][0] = "hello";
Now how can I change the 'h' to 'j' to make it "jello"?
You shouldn't try that because modifying a string literal is undefined behavior. Reasonable thing is to do this,
const char *p = "Hello";
words[0][0]=malloc(strlen(p)+1);
if(words[0][0]==NULL){
perror("malloc");
exit(1);
}
memcpy(words[0][0],p,strlen(p)+1);
Remember that you have declared a 2d array of char* - that's why allocated memory first using malloc and then copied the string literal. All this can be done with POSIX specified strdup
words[0][0]=strdup("Hello");
In C standard it is explicitly mentioned that modifying a string literal is undefined behavior. You should not use the code you have written for the very reason stated above.
After doing the changes you can make changes like words[0][0][0]='j' and that would be the correct code.
Also reconsider your design carefully. We seldom need 2d array of char* do you need it here? If not try to make design simpler with smaller constructs.
char *words[50];
And now you can make each pointer point to words which has different number of letters in it. The code would be quite similar to the earlier case - but instead of using words[0][0] you would use words[0], something like
words[0]=malloc(strlen(p)+1);
...
Or words[0]=strdup("Hello");.
The standard section which talks about string literal is given below, from 6.4.5p7 (note the array means the string literal)
It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

C - strncpy usage - segfault

char *test={"0x11","0x12","0x13","0x00","0x00"};
void change(char* test1, char* test2){
strncpy(test[3], test1, 4);
strncpy(test[4], test2, 4);
}
chage("0x55","0x66");
I can assign the characters to array element directly. However, it will cause the memory leak. That's why I use strncpy() instead.
Please advise if you know how to fix it.
There are at least three two you can get segfault here (one suggestion is to enable compiler warnings - they often pick up "stupid mistakes").
The problem is that test is probably misdeclared, it should probably have been:
char *test[]={"0x11","0x12","0x13","0x00","0x00"};
You initialize a char* with an array of char* which means that you initialize test with the first pointer in that array - which means that test will point to the string literal "0x11", so when you use test[3] as argument to strncpy you will send 1 which is converted to a pointer (probably to address 0x31). strncpy would then try to write to that address which is most probable not allowed. You had nearly a fourth reason here, if you had used test[5] you would asked to access beyond the end of the string which is disallowed (you can access test[4] becase it's the terminating null of the string).
Even if you fix those problems there is a problem because test[3] and test[4] are initialized using a string literal. Then strncpy would try to modify that string literal which is undefined behavior - the segfault is because test[3] and test[4] resides in read-only memory (allowing them to be in read-only memory is one reason why modifying string literals is undefined behavior).
What you instead could have done is to make sure that you have writable copies in test which is maybe not that straight forward in C. One normal solution is to have a function (that you have to call manually) that sets up test, and one that tears it down:
void init_test(void) {
int j;
for(j=0; j<sizeof(test)/sizeof(test[0]); j++)
test[j] = strdup(test[j]);
}
void init_fini(void) {
int j;
for(j=0; j<sizeof(test)/sizeof(test[0]); j++)
free(test[j]);
}
The other answer gives one good reason (never attempt to modify the contents of a string (str(n)cpy does that)).
I'm not even sure you want to represent "characters" as strings, especially as the first declaration doesn't work well (assigning an array of strings to a string).
It will take a lot of work to fix this for sure, but you shall begin by replacing "0x11" by '\x11' (I.E. actually use characters), replace strncpy by mere assignment (true characters are an atomic type which can be assigned directly) and finally change the parameters of the change function to get characters instead of strings.

C's strtok() and read only string literals

char *strtok(char *s1, const char *s2)
repeated calls to this function break string s1 into "tokens"--that is
the string is broken into substrings,
each terminating with a '\0', where
the '\0' replaces any characters
contained in string s2. The first call
uses the string to be tokenized as s1;
subsequent calls use NULL as the first
argument. A pointer to the beginning
of the current token is returned; NULL
is returned if there are no more
tokens.
Hi,
I have been trying to use strtok just now and found out that if I pass in a char* into s1, I get a segmentation fault. If I pass in a char[], strtok works fine.
Why is this?
I googled around and the reason seems to be something about how char* is read only and char[] is writeable. A more thorough explanation would be much appreciated.
What did you initialize the char * to?
If something like
char *text = "foobar";
then you have a pointer to some read-only characters
For
char text[7] = "foobar";
then you have a seven element array of characters that you can do what you like with.
strtok writes into the string you give it - overwriting the separator character with null and keeping a pointer to the rest of the string.
Hence, if you pass it a read-only string, it will attempt to write to it, and you get a segfault.
Also, becasue strtok keeps a reference to the rest of the string, it's not reeentrant - you can use it only on one string at a time. It's best avoided, really - consider strsep(3) instead - see, for example, here: http://www.rt.com/man/strsep.3.html (although that still writes into the string so has the same read-only/segfault issue)
An important point that's inferred but not stated explicitly:
Based on your question, I'm guessing that you're fairly new to programming in C, so I'd like to explain a little more about your situation. Forgive me if I'm mistaken; C can be hard to learn mostly because of subtle misunderstanding in underlying mechanisms so I like to make things as plain as possible.
As you know, when you write out your C program the compiler pre-creates everything for you based on the syntax. When you declare a variable anywhere in your code, e.g.:
int x = 0;
The compiler reads this line of text and says to itself: OK, I need to replace all occurrences in the current code scope of x with a constant reference to a region of memory I've allocated to hold an integer.
When your program is run, this line leads to a new action: I need to set the region of memory that x references to int value 0.
Note the subtle difference here: the memory location that reference point x holds is constant (and cannot be changed). However, the value that x points can be changed. You do it in your code through assignment, e.g. x = 15;. Also note that the single line of code actually amounts to two separate commands to the compiler.
When you have a statement like:
char *name = "Tom";
The compiler's process is like this: OK, I need to replace all occurrences in the current code scope of name with a constant reference to a region of memory I've allocated to hold a char pointer value. And it does so.
But there's that second step, which amounts to this: I need to create a constant array of characters which holds the values 'T', 'o', 'm', and NULL. Then I need to replace the part of the code which says "Tom" with the memory address of that constant string.
When your program is run, the final step occurs: setting the pointer to char's value (which isn't constant) to the memory address of that automatically created string (which is constant).
So a char * is not read-only. Only a const char * is read-only. But your problem in this case isn't that char *s are read-only, it's that your pointer references a read-only regions of memory.
I bring all this up because understanding this issue is the barrier between you looking at the definition of that function from the library and understanding the issue yourself versus having to ask us. And I've somewhat simplified some of the details in the hopes of making the issue more understandable.
I hope this was helpful. ;)
I blame the C standard.
char *s = "abc";
could have been defined to give the same error as
const char *cs = "abc";
char *s = cs;
on grounds that string literals are unmodifiable. But it wasn't, it was defined to compile. Go figure. [Edit: Mike B has gone figured - "const" didn't exist at all in K&R C. ISO C, plus every version of C and C++ since, has wanted to be backward-compatible. So it has to be valid.]
If it had been defined to give an error, then you couldn't have got as far as the segfault, because strtok's first parameter is char*, so the compiler would have prevented you passing in the pointer generated from the literal.
It may be of interest that there was at one time a plan in C++ for this to be deprecated (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/1996/N0896.asc). But 12 years later I can't persuade either gcc or g++ to give me any kind of warning for assigning a literal to non-const char*, so it isn't all that loudly deprecated.
[Edit: aha: -Wwrite-strings, which isn't included in -Wall or -Wextra]
In brief:
char *s = "HAPPY DAY";
printf("\n %s ", s);
s = "NEW YEAR"; /* Valid */
printf("\n %s ", s);
s[0] = 'c'; /* Invalid */
If you look at your compiler documentation, odds are there is a option you can set to make those strings writable.

Resources