This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why does this Seg Fault?
What is the difference between char a[] = “string”; and char *p = “string”;
Trying to understand why s[0]='H' fails. I'm guessing this has something to do with the data segment in the process memory but maybe someone better explain this?
void str2 (void)
{
char *s = "hello";
printf("%s\n", s);
s[0] = 'H'; //maybe this is a problem because content in s is constant?
printf("%s\n", s);
}
int main()
{
str2();
return 0;
}
It's wrong because the C standard says that attempting to modify a string literal gives undefined behavior.
Exactly what will happen can and will vary. In some cases it'll "work" -- the content of the string literal will change to what you've asked (e.g., back in the MS-DOS days, it usually did). In other cases, the compiler will merge identical string literals, so something like:
char *a = "1234";
char *b = "1234";
a[1] = 'a';
printf("%s\n", b);
...would print out 1a34, even though you never explicitly modified b at all.
In still other cases (including most modern systems) you can expect the attempted write to fail completely and some sort of exception/signal to be thrown instead.
You are trying to modify a string literal, which resides in implementation defined read only memory thereby causing an Undefined Behavior. Note that the undefined behavior does not warrant that your program crashes but it might show you any behavior.
Good Read:
What is the difference between char a[] = ?string?; and char *p = ?string?;?
I think this behavior, a good or stern compiler should not allow since this is (char *s = "hello") pointer to constant i.e. modifying the contents will lead to undefined behavior if the compiler will not throw any error on this
Related
I was reading up on common C pitfalls and came up to this article on some famous Uni website. (It is the 2nd link that comes up on google).
The last example on that page is,
// Memory allocation on the stack
void b(char **p) {
char * str="print this string";
*p = str;
}
int main(void) {
char * s;
b(&s);
s[0]='j'; //crash, since the memory for str is allocated on the stack,
//and the call to b has already returned, the memory pointed to by str
//is no longer valid.
return 0;
}
That explanation in the comment got me thinking then, that, isn't the memory for string literals not static?
Isn't the actual error there then that you are not supposed to modify string literals, because it is undefined behavior? Or are the comments there correct and my understanding of that example is wrong?
Upon searching further, I saw this question: referencing a char that went out of scope and I understood from that question that, the following is valid code.
#include <malloc.h>
char* a = NULL;
{
char* b = "stackoverflow";
a = b;
}
int main() {
puts(a);
}
Also this question agrees with the other stackoverflow question and my thinking, but opposes the comment from that website's code.
To test it, I tried the following,
#include <stdio.h>
#include <malloc.h>
void b(char **p)
{
char * str = "print this string";
*p = str;
}
int main(void)
{
char * s;
b(&s);
// s[0]='j'; //crash, since the memory for str is allocated on the stack,
//and the call to b has already returned, the memory pointed to by str is no longer valid.
printf("%s \n", s);
return 0;
}
which as expected does not give a segmentation fault.
Standard says (emphasize is mine):
6.4.5 String literals
[...] The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. [...]
[...] If the program attempts to
modify such an array, the behavior is undefined. [...]
No, you misunderstand the reason for crash. String literals have static duration, meaning that they exist for the lifetime of the program. Since your pointer points to the literal, you can use it anytime.
The reason for the crash is the fact that string literals are read-only. In fact char* x = "" is an error in C++, as it should be const char* x = "". They are read-only from language perspective, and any attempt to modify them would lead to undefined behavior.
In practical terms, they are often put in the read-only segment, so any attempt at modification triggers a GPF - general protection fault. Usual response to GPF is a program termination - and this is what you are witnessing with your application.
String literals are placed in general in rodata section (read-only) within the ELF file, and under Linux\Windows\Mac-OS they will end up in a memory region which will generate a fault when written to (configured so using MMU or MPU by the OS upon loading)
This question already has an answer here:
Is modification of string literals undefined behaviour according to the C89 standard?
(1 answer)
Closed 7 years ago.
So I've been playing around with C lately and have been trying to understand the intricacies of passing by value/reference and the ability to manipulate a passed-in variable inside a function. I've hit a road block, however, with the following:
void modifyCharArray(char *input)
{
//change input[0] to 'D'
input[0] = 'D';
}
int main()
{
char *test = "Bad";
modifyCharArray(test);
printf("Bad --> %s\n", test);
}
So the idea was to just modify a char array inside a function, and then print out said array after the modification completed. However, this fails, since all I'm doing is modifying the value of input that is passed in and not the actual memory address.
In short, is there any way I can take in a char *input into a function and modify its original memory address without using something like memcpy from string.h?
In short, is there any way I can take in a char *input into a function and modify its original memory address without using something like memcpy from string.h?
Yes, you can. Your function modifyCharArray is doing the right thing. What you are seeing is caused by that fact that
char *test = "Bad";
creates "Bad" in read only memory of the program and test points to that memory. Changing it is cause for undefined behavior.
If you want to create a modifiable string, use:
char test[] = "Bad";
This question already has answers here:
What is the difference between char s[] and char *s?
(14 answers)
Closed 9 years ago.
void main() {
char *x;
x="abc";
*x='1';
}
Why it comes with error "Access violation writing location"?
I cannot assign value to x by *x='1'?
Modifying string literals leads to undefined behavior, try using char arrays instead:
int main() {
char x[] = "abc";
*x ='1';
}
Also note you should use int main().
Or if you prefer to use pointers, use this a little redundant example:
int main() {
char x[] = "abc";
char *y = x;
*y ='1';
}
Thats wrong because you are attempting to modify a string literal. It is created in readonly mode and if you will try to change that then it would be an access violation and hence result in error.
As a solution as to how it can be achieved you can try using char arrays
The application is loaded in several memory regions (memory pages), code read-only executable (program counter can run in it), and string literals might ideally go into a read-only region.
Writing to it would give an access violation. In fact is nice that you get that violation, are you running Windows? That would possitively surprise me.
consider the following code:
t[7] = "Hellow\0";
s[3] = "Dad";
//now copy t to s using the following strcpy function:
void strcpy(char *s, char *t) {
int i = 0;
while ((s[i] = t[i]) != '\0')
i++;
}
the above code is taken from "The C programming Language book".
my question is - we are copying 7 bytes to what was declared as 3 bytes.
how do I know that after copying, other data that was after s[] in the memory
wasn't deleted?
and one more question please: char *s is identical to char* s?
Thank you !
As you correctly point out, passing s[3] as the first argument is going to overwrite some memory that could well be used by something else. At best your program will crash right there and then; at worst, it will carry on running, damaged, and eventually end up corrupting something it was supposed to handle.
The intended way to do this in C is to never pass an array shorter than required.
By the way, it looks like you've swapped s and t; what was meant was probably this:
void strcpy(char *t, char *s) {
int i = 0;
while ((t[i] = s[i]) != '\0')
i++;
}
You can now copy s[4] into t[7] using this amended strcpy routine:
char t[] = "Hellow";
char s[] = "Dad";
strcpy(t, s);
(edit: the length of s is now fixed)
About the first question.
If you're lucky your program will crash.
If you are not it will keep on running and overwrite memory areas that shouldn't be touched (as you don't know what's actually in there). This would be a hell to debug...
About the second question.
Both char* s and char *s do the same thing. It's just a matter of style.
That is, char* s could be interpreted as "s is of type char pointer" while char *s could be interpreted as "s is a pointer to a char". But really, syntactically it's the same.
That example does nothing, you're not invoking strcpy yet. But if you did this:
strcpy(s,t);
It would be wrong in several ways:
The string s is not null terminated. In C the only way strcpy can know where a string ends is by finding the '\0'. The function may think that s is infinite and it might corrupt your memory and make the program crash.
Even if was null terminated, as you said the size of s is only 3. Because of the same cause, strcpy would write memory beyond where s ends, with maybe catastrophic results.
The workaround for this in C is the function strncpy(dst, src, max) in which you specify the maximum number of chars to copy. Still beware that this function might generate a not null terminated string if src is shorter than max chars.
I will assume that both s and t (above the function definition) are arrays of char.
how do I know that after copying, other data that was after s[] in the memory wasn't deleted?
No, this is worse, you are invoking undefined behavior and we know this because the standard says so. All you are allowed to do after the three elements in s is compare. Assignment is a strict no-no. Advance further, and you're not even allowed to compare.
and one more question please: char s is identical to char s?
In most cases it is a matter of style where you stick your asterix except if you are going to declare/define more than one, in which case you need to stick one to every variable you are going to name (as a pointer).
a string-literal "Hellow\0" is equal to "Hellow"
if you define
t[7] = "Hellow";
s[7] = "Dad";
your example is defined and crashes not.
Here's a simple example of a program that concatenates two strings.
#include <stdio.h>
void strcat(char *s, char *t);
void strcat(char *s, char *t) {
while (*s++ != '\0');
s--;
while ((*s++ = *t++) != '\0');
}
int main() {
char *s = "hello";
strcat(s, " world");
while (*s != '\0') {
putchar(*s++);
}
return 0;
}
I'm wondering why it works. In main(), I have a pointer to the string "hello". According to the K&R book, modifying a string like that is undefined behavior. So why is the program able to modify it by appending " world"? Or is appending not considered as modifying?
Undefined behavior means a compiler can emit code that does anything. Working is a subset of undefined.
I +1'd MSN, but as for why it works, it's because nothing has come along to fill the space behind your string yet. Declare a few more variables, add some complexity, and you'll start to see some wackiness.
Perhaps surprisingly, your compiler has allocated the literal "hello" into read/write initialized data instead of read-only initialized data. Your assignment clobbers whatever is adjacent to that spot, but your program is small and simple enough that you don't see the effects. (Put it in a for loop and see if you are clobbering the " world" literal.)
It fails on Ubuntu x64 because gcc puts string literals in read-only data, and when you try to write, the hardware MMU objects.
You were lucky this time.
Especially in debug mode some compilers will put spare memory (often filled with some obvious value) around declarations so you can find code like this.
It also depends on the how the pointer is declared. For example, can change ptr, and what ptr points to:
char * ptr;
Can change what ptr points to, but not ptr:
char const * ptr;
Can change ptr, but not what ptr points to:
const char * ptr;
Can't change anything:
const char const * ptr;
According to the C99 specifification (C99: TC3, 6.4.5, §5), string literals are
[...] used to initialize an array of static storage duration and length just
sufficient to contain the sequence. [...]
which means they have the type char [], ie modification is possible in principle. Why you shouldn't do it is explained in §6:
It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
Different string literals with the same contents may - but don't have to - be mapped to the same memory location. As the behaviour is undefined, compilers are free to put them in read-only sections in order to cleanly fail instead of introducing possibly hard to detect error sources.
I'm wondering why it works
It doesn't. It causes a Segmentation Fault on Ubuntu x64; for code to work it shouldn't just work on your machine.
Moving the modified data to the stack gets around the data area protection in linux:
int main() {
char b[] = "hello";
char c[] = " ";
char *s = b;
strcat(s, " world");
puts(b);
puts(c);
return 0;
}
Though you then are only safe as 'world' fits in the unused spaces between stack data - change b to "hello to" and linux detects the stack corruption:
*** stack smashing detected ***: bin/clobber terminated
The compiler is allowing you to modify s because you have improperly marked it as non-const -- a pointer to a static string like that should be
const char *s = "hello";
With the const modifier missing, you've basically disabled the safety that prevents you from writing into memory that you shouldn't write into. C does very little to keep you from shooting yourself in the foot. In this case you got lucky and only grazed your pinky toe.
s points to a bit of memory that holds "hello", but was not intended to contain more than that. This means that it is very likely that you will be overwriting something else. That is very dangerous, even though it may seem to work.
Two observations:
The * in *s-- is not necessary. s-- would suffice, because you only want to decrement the value.
You don't need to write strcat yourself. It already exists (you probably knew that, but I'm telling you anyway:-)).