I was reading up on common C pitfalls and came up to this article on some famous Uni website. (It is the 2nd link that comes up on google).
The last example on that page is,
// Memory allocation on the stack
void b(char **p) {
char * str="print this string";
*p = str;
}
int main(void) {
char * s;
b(&s);
s[0]='j'; //crash, since the memory for str is allocated on the stack,
//and the call to b has already returned, the memory pointed to by str
//is no longer valid.
return 0;
}
That explanation in the comment got me thinking then, that, isn't the memory for string literals not static?
Isn't the actual error there then that you are not supposed to modify string literals, because it is undefined behavior? Or are the comments there correct and my understanding of that example is wrong?
Upon searching further, I saw this question: referencing a char that went out of scope and I understood from that question that, the following is valid code.
#include <malloc.h>
char* a = NULL;
{
char* b = "stackoverflow";
a = b;
}
int main() {
puts(a);
}
Also this question agrees with the other stackoverflow question and my thinking, but opposes the comment from that website's code.
To test it, I tried the following,
#include <stdio.h>
#include <malloc.h>
void b(char **p)
{
char * str = "print this string";
*p = str;
}
int main(void)
{
char * s;
b(&s);
// s[0]='j'; //crash, since the memory for str is allocated on the stack,
//and the call to b has already returned, the memory pointed to by str is no longer valid.
printf("%s \n", s);
return 0;
}
which as expected does not give a segmentation fault.
Standard says (emphasize is mine):
6.4.5 String literals
[...] The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. [...]
[...] If the program attempts to
modify such an array, the behavior is undefined. [...]
No, you misunderstand the reason for crash. String literals have static duration, meaning that they exist for the lifetime of the program. Since your pointer points to the literal, you can use it anytime.
The reason for the crash is the fact that string literals are read-only. In fact char* x = "" is an error in C++, as it should be const char* x = "". They are read-only from language perspective, and any attempt to modify them would lead to undefined behavior.
In practical terms, they are often put in the read-only segment, so any attempt at modification triggers a GPF - general protection fault. Usual response to GPF is a program termination - and this is what you are witnessing with your application.
String literals are placed in general in rodata section (read-only) within the ELF file, and under Linux\Windows\Mac-OS they will end up in a memory region which will generate a fault when written to (configured so using MMU or MPU by the OS upon loading)
Related
As part of our training in the Academy of Programming Languages, we also learned C. During the test, we encountered the question of what the program output would be:
#include <stdio.h>
#include <string.h>
int main(){
char str[] = "hmmmm..";
const char * const ptr1[] = {"to be","or not to be","that is the question"};
char *ptr2 = "that is the qusetion";
(&ptr2)[3] = str;
strcpy(str,"(Hamlet)");
for (int i = 0; i < sizeof(ptr1)/sizeof(*ptr1); ++i){
printf("%s ", ptr1[i]);
}
printf("\n");
return 0;
}
Later, after examining the answers, it became clear that the cell (& ptr2)[3] was identical to the memory cell in &ptr1[2], so the output of the program is: to be or not to be (Hamlet)
My question is, is it possible to know, only by written code in the notebook, without checking any compiler, that a certain pointer (or all variables in general) follow or precede other variables in memory?
Note, I do not mean array variables, so all the elements in the array must be in sequence.
In this statement:
(&ptr2)[3] = str;
ptr2 was defined with char *ptr2 inside main. With this definition, the compiler is responsible for providing storage for ptr2. The compiler is allowed to use whatever storage it wants for this—it could be before ptr1, it could be after ptr1, it could be close, it could be far away.
Then &ptr2 takes the address of ptr2. This is allowed, but we do not know where that address will be in relation to ptr1 or anything else, because the compiler is allowed to use whatever storage it wants.
Since ptr2 is a char *, &ptr2 is a pointer to char *, also known as char **.
Then (&ptr2)[3] attempts to refer to element 3 of an array of char * that is at &ptr2. But there is no array there in C’s model of computation. There is just one char * there. When you try to refer to element of 3 of an array when there is no element 3 of an array, the behavior is not defined by the C standard.
Thus, this code is a bad example. It appears the test author misunderstood C, and this code does not illustrate what was intended.
char *ptr2 = some initializer;
(&ptr2)[3] = str;
When you evaluate &ptr2, you obtain the address of memory where is stored the pointer that points to that initializer.
When you do (&ptr2)[3]=something you try to write 3*sizeof(void*) locations further from the location of ptr2, the address of a string. This is invalid and almost sure it finishes with segmentation fault.
No, it's not possible and no such assumptions can be made.
By writing outside a variable's space, this code invokes undefined behavior, it's basically "illegal" and anything can happen when you run it. The C language specification says nothing about variables being allocated on a stack in some particular order that you can exploit, it does however say that accessing random memory is undefined behavior.
Basically this code is pretty horrible and should never be used, even less so in a teaching environment. It makes me sad, how people mis-understand C and still teach it to others. :/
A program usually is loaded in memory with this structure:
Stack, Mmap'ed files, Heap, BSS (uninitialized static variables), Data segment (Initialized static variables) and Text (Compiled code)
You can learn more here:
https://manybutfinite.com/post/anatomy-of-a-program-in-memory/
Depending on how you declare the variable it will go to one of the places said before.
The compiler will arrange the BSS and Data segment variables as he wishes on compilation time so usually no chance. Neither heap vars (the OS will get the memory block that fits better the space allocated)
In the stack (which is a LIFO structure) the variables are put one over eachother so if you have:
int a = 5;
int b = 10;
You can say that a and b will be placed one following the other. So, in this case you can tell.
There is another exception and that is if the variable is an structure or an array, they are always placed like i said before, each one following the last.
In your code ptr1 is an array of arrays of chars so it will follow the exception i said.
In fact, do the following exercise:
#include <stdio.h>
#include <string.h>
int main(){
const char * const ptr1[] = {"to be","or not to be","that is the question"};
for (int i = 0; i < 3; i++) {
for (int j = 0; j < strlen(ptr1[i]); j++)
printf("%p -> %c\n", &ptr1[i][j], ptr1[i][j]);
printf("\n");
}
}
and you will see the memory address and its content!
Have a nice day.
I wonder why the following code does not throw segmentation fault when a string literal which is a result of dirname() is modified but throws segmentation fault when a string literal created in a usual is modified:
#include <stdio.h>
#include <stdlib.h>
#include <libgen.h>
#define FILE_PATH "/usr/bin/screen"
int main(void)
{
char db_file[] = FILE_PATH;
char *filename = dirname(db_file);
/* no segfault here */
filename[1] = 'a';
/* segfault here */
char *p = "abc";
p[1] = 'z';
exit(0);
}
I know that it's an UB to modify a string literal so output I get may be perfectly valid but I wonder if this can be explained. Are string literals that are returned by functions treated differently by compilers? The same situation happens when I compile this code with Clang 3.0 on x86 and gcc on x86 and ARM.
dirname() does not return a reference to a "string" literal, so it is fully legal to modify the data referenced by the pointer returned. Whether to do so makes sense or not is a different question, as the pointer returned may reference the char array passed to dirname().
However, if the OP's code would have passed a "string"-literal to dirname(), this would have already been illegal, as the POSIX specification explicitly state that the function may modify the array passed in.
From the POSIX specifications:
The dirname() function may modify the string pointed to by path.
From the manual
These functions may return pointers to statically allocated memory which may be overwritten by subsequent calls. Alternatively, they may return a pointer to some part of path, so that the string referred to by path should not be modified or freed until the pointer returned by the function is no longer required.
So it might return memory you can modify, and it might not. Depends on your system I guess.
During compilation string literals are stored in a read-only memory segment, and loaded as such at runtime.
This question explains things very nicely:
String literals: Where do they go?
This question already has answers here:
String literals: Where do they go?
(8 answers)
Closed 9 years ago.
The following program will print on the screen "Hello\nWorld\n" ('\n' = line down) as it supposed to. But actually, as i learned, something here isn't done as it should be. The "hello" and "world" strings are defined inside a function (and therefore are local and their memory is released at the end of the function's scope - right?). Actually we don't do a malloc for them as we are supposed to (to save the memory after the scope). So when a() is done, isn't the memory stack move up it's cursor and "world" will be placed in the memory at the same place where "hello" was ? (it looks like it doesn't happen here and I don't understand why, and therefore, why do i usually need to do this malloc if actually the memory block is saved and not returned after the scope?)
Thanks.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *a()
{
char *str1 = "Hello";
return str1;
}
char *b()
{
char *str2 = "World";
return str2;
}
int main()
{
char *main_str1 = a();
char *main_str2 = b();
puts(main_str1);
puts(main_str2);
return 0;
}
edit: So what you are saying actually is that my "hello" string takes a constant place in memory and even though it's inside a function , i can read it from anywhere i want if i have it's address (so its defined just like a malloc but you cant free it) - right ?
Constant strings are not allocated on the stack. Only the pointer is allocated on the stack. The pointer returned from a() and b() points to some literal constant part of executable memory.
Another question dealing with this topic
In this case all works because string literal are allocated in memory data available for all program lifetime.
Your code is equivalent to (produce same result,I mean):
char *a()
{
return "Hello";
}
This code doesn't work
char* a()
{
char array[6];
strcpy(array,"Hello");
return array;
}
because array[] is created on stack and destroyed when function returning
String literals (strings that are defined with "quotes") are created statically in the program's memory space at compile-time. When you go char *str1 = "Hello";, you aren't creating new memory at run-time like you would with a malloc call.
C does not obligate the compiler to move memory on the stack as OP suggests and that is why the observed behavior is not failing as expected.
Compiler models and optimizations may allow a program, such as OP's with undefined behavior (UB), to apparently work without side effects like corrupt memory or seg faults. Another compiler may also compile the same code with very different results.
Version with allocated memory follows:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *a() {
return strdup("Hello"); // combo strlen, malloc and memcpy
}
char *b() {
return strdup("World");
}
int main() {
char *main_str1 = a();
char *main_str2 = b();
puts(main_str1);
puts(main_str2);
free(main_str1);
free(main_str2);
return 0;
}
#include<stdio.h>
#include<stdlib.h>
char* re()
{
char *p = "hello";
return p;
}
int main()
{
char* tem = re();
printf("%s", tem);
return 0;
}
my compiler is Dev-C++.
I think that when the function of 're' completes, the pointer of 'p' will be deleted and the stack space which 'p' havs pointed to also will be deleted. So the pointer of 'tem' can not visit the stack space which the 'p' points to.
In my opinions, this code will appear some bugs. but why not?
This problem distorts me a long time. If you can tell me the reason, i will appreciate your kind heart.
p does not point to a stack space. It points to the string literal "hello". Since string literals are guaranteed to be valid at the whole program, your program is OK.
(I don't know about Dev-C++, but in most compilers, string literals are allocated in some read-only memory at the loading of the program, and stays there until the end of it)
Edit: note that even if the string was on the stack, and the code was really buggy, nothing in the language guarantee that is will not work. invalid memory can (but not have to) still contain the value it contained before being invalid.
The string "hello" is not stack alloc'ed (but char *p pointer is).It is in the 'data segment' because it's a constant value (read-only memory).
From C FAQ: http://c-faq.com/decl/strlitinit.html
Here's a simple example of a program that concatenates two strings.
#include <stdio.h>
void strcat(char *s, char *t);
void strcat(char *s, char *t) {
while (*s++ != '\0');
s--;
while ((*s++ = *t++) != '\0');
}
int main() {
char *s = "hello";
strcat(s, " world");
while (*s != '\0') {
putchar(*s++);
}
return 0;
}
I'm wondering why it works. In main(), I have a pointer to the string "hello". According to the K&R book, modifying a string like that is undefined behavior. So why is the program able to modify it by appending " world"? Or is appending not considered as modifying?
Undefined behavior means a compiler can emit code that does anything. Working is a subset of undefined.
I +1'd MSN, but as for why it works, it's because nothing has come along to fill the space behind your string yet. Declare a few more variables, add some complexity, and you'll start to see some wackiness.
Perhaps surprisingly, your compiler has allocated the literal "hello" into read/write initialized data instead of read-only initialized data. Your assignment clobbers whatever is adjacent to that spot, but your program is small and simple enough that you don't see the effects. (Put it in a for loop and see if you are clobbering the " world" literal.)
It fails on Ubuntu x64 because gcc puts string literals in read-only data, and when you try to write, the hardware MMU objects.
You were lucky this time.
Especially in debug mode some compilers will put spare memory (often filled with some obvious value) around declarations so you can find code like this.
It also depends on the how the pointer is declared. For example, can change ptr, and what ptr points to:
char * ptr;
Can change what ptr points to, but not ptr:
char const * ptr;
Can change ptr, but not what ptr points to:
const char * ptr;
Can't change anything:
const char const * ptr;
According to the C99 specifification (C99: TC3, 6.4.5, §5), string literals are
[...] used to initialize an array of static storage duration and length just
sufficient to contain the sequence. [...]
which means they have the type char [], ie modification is possible in principle. Why you shouldn't do it is explained in §6:
It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
Different string literals with the same contents may - but don't have to - be mapped to the same memory location. As the behaviour is undefined, compilers are free to put them in read-only sections in order to cleanly fail instead of introducing possibly hard to detect error sources.
I'm wondering why it works
It doesn't. It causes a Segmentation Fault on Ubuntu x64; for code to work it shouldn't just work on your machine.
Moving the modified data to the stack gets around the data area protection in linux:
int main() {
char b[] = "hello";
char c[] = " ";
char *s = b;
strcat(s, " world");
puts(b);
puts(c);
return 0;
}
Though you then are only safe as 'world' fits in the unused spaces between stack data - change b to "hello to" and linux detects the stack corruption:
*** stack smashing detected ***: bin/clobber terminated
The compiler is allowing you to modify s because you have improperly marked it as non-const -- a pointer to a static string like that should be
const char *s = "hello";
With the const modifier missing, you've basically disabled the safety that prevents you from writing into memory that you shouldn't write into. C does very little to keep you from shooting yourself in the foot. In this case you got lucky and only grazed your pinky toe.
s points to a bit of memory that holds "hello", but was not intended to contain more than that. This means that it is very likely that you will be overwriting something else. That is very dangerous, even though it may seem to work.
Two observations:
The * in *s-- is not necessary. s-- would suffice, because you only want to decrement the value.
You don't need to write strcat yourself. It already exists (you probably knew that, but I'm telling you anyway:-)).