what is stack smashing (C)?

what is stack smashing (C)? - c

Code:
int str_join(char *a, const char *b) {
int sz =0;
while(*a++) sz++;
char *st = a -1, c;
*st = (char) 32;
while((c = *b++)) *++st = c;
*++st = 0;
return sz;
}
....
char a[] = "StringA";
printf("string-1 length = %d, String a = %s\n", str_join(&a[0],"StringB"), a);
Output:
string-1 length = 7, char *a = StringA StringB
*** stack smashing detected **** : /T02 terminated
Aborted (core dumped)
I don't understand why it's showing stack smashing? and what is *stack smashing? Or is it my compiler's error?

Well, stack smashing or stack buffer overflow is a rather detailed topic to be discussed here, you can refer to this wiki article for more info.
Coming to the code shown here, the problem is, your array a is not large enough to hold the final concatenated result.
Thereby, by saying
while((c = *b++)) *++st = c;
you're essentially accessing out of bound memory which invokes undefined behavior. This is the reason you're getting the "stack smashing" issue because you're trying to access memory which does not belong to your process.
To solve this, you need to make sure that array a contains enough space to hold both the first and second string concatenated together. You have to provide a larger destination array, in short.

Stack smashing means you've written outside of ("smashed" past/through) the function's storage space for local variables (this area is called the "stack", in most systems and programming languages). You may also find this type of error called "stack overflow" and/or "stack underflow".
In your code, C is probably putting the string pointed to by a on the stack. In your case, the place that causes the stack "smash" is when you increment st beyond the original a pointer and write to where it points, you're writing outside the area the C compiler guarantees to have reserved for the original string assigned into a.
Whenever you write outside an area of memory that is already properly "reserved" in C, that's "undefined behavior" (which just means that the C language/standard doesn't say what happens): usually, you end up overwriting something else in your program's memory (programs typically put other information right next to your variables on the stack, like return addresses and other internal details), or your program tries writing outside of the memory the operating system has "allowed" it to use. Either way, the program typically breaks, sometimes immediately and obviously (for example, with a "segmentation fault" error), sometimes in very hidden ways that don't become obvious until way later.
In this case, your compiler is building your program with special protections to detect this problem and so your programs exits with an error message. If the compiler didn't do that, your program would try to continue to run, except it might end up doing the wrong thing and/or crashing.
The solution comes down to needing to explicitly tell your code to have enough memory for your combined string. You can either do this by explicitly specifying the length of the "a" array to be long enough for both strings, but that's usually only sufficient for simple uses where you know in advance how much space you need. For a general-purpose solution, you'd use a function like malloc to get a pointer to a new chunk of memory from the operating system that has the size you need/want once you've calculated what the full size is going to be (just remember to call free on pointers that you get from malloc and similar functions once you're done with them).

Minimal reproduction example with disassembly analysis
main.c
void myfunc(char *const src, int len) {
int i;
for (i = 0; i < len; ++i) {
src[i] = 42;
}
}
int main(void) {
char arr[] = {'a', 'b', 'c', 'd'};
int len = sizeof(arr);
myfunc(arr, len + 1);
return 0;
}
GitHub upstream.
Compile and run:
gcc -fstack-protector-all -g -O0 -std=c99 main.c
ulimit -c unlimited && rm -f core
./a.out
fails as desired:
*** stack smashing detected ***: terminated
Aborted (core dumped)
Tested on Ubuntu 20.04, GCC 10.2.0.
On Ubuntu 16.04, GCC 6.4.0, I could reproduce with -fstack-protector instead of -fstack-protector-all, but it stopped blowing up when I tested on GCC 10.2.0 as per Geng Jiawen's comment. man gcc clarifies that as suggested by the option name, the -all version adds checks more aggressively, and therefore presumably incurs a larger performance loss:
-fstack-protector
Emit extra code to check for buffer overflows, such as stack smashing attacks. This is done by adding a guard variable to functions with vulnerable objects. This includes functions that call "alloca", and functions with buffers larger than or equal to 8 bytes. The guards are initialized when a function is entered and then checked when the function exits. If a guard check fails, an error message is printed and the program exits. Only variables that are actually allocated on the stack are considered, optimized away variables or variables allocated in registers don't count.
-fstack-protector-all
Like -fstack-protector except that all functions are protected.
Disassembly
Now we look at the disassembly:
objdump -D a.out
which contains:
int main (void){
400579: 55 push %rbp
40057a: 48 89 e5 mov %rsp,%rbp
# Allocate 0x10 of stack space.
40057d: 48 83 ec 10 sub $0x10,%rsp
# Put the 8 byte canary from %fs:0x28 to -0x8(%rbp),
# which is right at the bottom of the stack.
400581: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
400588: 00 00
40058a: 48 89 45 f8 mov %rax,-0x8(%rbp)
40058e: 31 c0 xor %eax,%eax
char arr[] = {'a', 'b', 'c', 'd'};
400590: c6 45 f4 61 movb $0x61,-0xc(%rbp)
400594: c6 45 f5 62 movb $0x62,-0xb(%rbp)
400598: c6 45 f6 63 movb $0x63,-0xa(%rbp)
40059c: c6 45 f7 64 movb $0x64,-0x9(%rbp)
int len = sizeof(arr);
4005a0: c7 45 f0 04 00 00 00 movl $0x4,-0x10(%rbp)
myfunc(arr, len + 1);
4005a7: 8b 45 f0 mov -0x10(%rbp),%eax
4005aa: 8d 50 01 lea 0x1(%rax),%edx
4005ad: 48 8d 45 f4 lea -0xc(%rbp),%rax
4005b1: 89 d6 mov %edx,%esi
4005b3: 48 89 c7 mov %rax,%rdi
4005b6: e8 8b ff ff ff callq 400546 <myfunc>
return 0;
4005bb: b8 00 00 00 00 mov $0x0,%eax
}
# Check that the canary at -0x8(%rbp) hasn't changed after calling myfunc.
# If it has, jump to the failure point __stack_chk_fail.
4005c0: 48 8b 4d f8 mov -0x8(%rbp),%rcx
4005c4: 64 48 33 0c 25 28 00 xor %fs:0x28,%rcx
4005cb: 00 00
4005cd: 74 05 je 4005d4 <main+0x5b>
4005cf: e8 4c fe ff ff callq 400420 <__stack_chk_fail#plt>
# Otherwise, exit normally.
4005d4: c9 leaveq
4005d5: c3 retq
4005d6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
4005dd: 00 00 00
Notice the handy comments automatically added by objdump's artificial intelligence module.
If you run this program multiple times through GDB, you will see that:
the canary gets a different random value every time
the last loop of myfunc is exactly what modifies the address of the canary
The canary randomized by setting it with %fs:0x28, which contains a random value as explained at:
https://unix.stackexchange.com/questions/453749/what-sets-fs0x28-stack-canary
Why does this memory address %fs:0x28 ( fs[0x28] ) have a random value?
How to debug it?
See: Stack smashing detected

Related

How does C compiler declare array when there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate?

This may be a simple question but I am trying to understand how the compiler assigns all the elements in the array to zero for example:
''' array[5] = {0}; '''
I checked gcc.gnu.org for some explanation but was unsuccessful. Maybe I am looking in the wrong location. I also checked C99 and stumbled upon section 6.7.8/21, but it does not tell me how it is initialized. I also looked at other posts about this topic but most address what happens, not how it happens.
Anyway, my questions are:
How does the compiler assign all the elements to zero through this single expression?
Where can I find information on how the compiler works, specifically with the example above?
Thanks...

C 2018 6.7.9 19 says:
… all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration.
C 2018 6.7.9 10 says, for objects with static storage duration that are not initialized explicitly:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
— if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
Thus, when you initialize part but not all of an array, the rules of the C standard require the compiler to initialize the other elements to zero for arithmetic types and a null pointer for pointer types.
Compilers are free to implement this as they choose. The method is not usually document, and it varies by situation: For a small array, the compiler might generate explicit instructions. For a large array, the compiler might generate a loop (including a single instruction that implements a loop to zero memory) or call a library routine. Or, if the compiler can detect that initialization is not actually necessary for correct program operation, it might omit it. For example, if the compiler can see that a subsequent assignment sets an array element to a value before that array element is used, the compiler might omit any zeroing of the array element and instead let the assignment be the initial value.

This is an implementation detail that will vary widely between different compilers and different source code, but the general answer is the same - the compiler will generate whatever code is necessary to zero out the uninitialized elements.
Here's how things shake out on my implementation (gcc version 7.3.1).
Source code (file init.c):
#include <stdio.h>
int main( void )
{
int arr[5] = {0};
for ( size_t i = 0; i < 5; i++ )
printf( "arr[%zu] = %d\n", i, arr[i] );
return 0;
}
Compiled using the command
gcc -o init -std=c11 -pedantic -Wall -Werror init.c
gives me the following machine code (viewed using objdump -d init):
00000000004004c7 <main>:
4004c7: 55 push %rbp
4004c8: 48 89 e5 mov %rsp,%rbp
4004cb: 48 83 ec 20 sub $0x20,%rsp ;; allocates space for arr and i, rounded up to multiple of 4
4004cf: 48 c7 45 e0 00 00 00 movq $0x0,-0x20(%rbp) ;; zeros out a[0] and a[1]
4004d6: 00
4004d7: 48 c7 45 e8 00 00 00 movq $0x0,-0x18(%rbp) ;; zeros out a[2] and a[3]
4004de: 00
4004df: c7 45 f0 00 00 00 00 movl $0x0,-0x10(%rbp) ;; zeros out a[4]
4004e6: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp) ;; zeros out i
...
Above a certain size, explicitly zeroing out one or two elements at a time gets cumbersome - when I change the array size to 50, the generated assembly code became
00000000004004c7 <main>:
4004c7: 55 push %rbp
4004c8: 48 89 e5 mov %rsp,%rbp
4004cb: 48 81 ec d0 00 00 00 sub $0xd0,%rsp
4004d2: 48 8d 95 30 ff ff ff lea -0xd0(%rbp),%rdx
4004d9: b8 00 00 00 00 mov $0x0,%eax
4004de: b9 19 00 00 00 mov $0x19,%ecx
4004e3: 48 89 d7 mov %rdx,%rdi
4004e6: f3 48 ab rep stos %rax,%es:(%rdi)
4004e9: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp)
4004f0: 00
...
The rep stos instruction is essentially a loop (I don't speak x86 assembly fluently, somebody else will have to explain exactly how it works). It zeros out each element of the array in succession, rather than zeroing out the entire array in a single instruction if I understand it correctly.

Segmentation fault when calling a function located in the heap

I'm trying to tweak the rules a little bit here, and malloc a buffer,
then copy a function to the buffer.
Calling the buffered function works, but the function throws a Segmentation fault when i'm trying to call another function within.
Any thoughts why?
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <stdlib.h>
int foo(int x)
{
printf("%d\n", x);
}
int bar(int x)
{
}
int main()
{
int foo_size = bar - foo;
void* buf_ptr;
buf_ptr = malloc(1024);
memcpy(buf_ptr, foo, foo_size);
mprotect((void*)(((int)buf_ptr) & ~(sysconf(_SC_PAGE_SIZE) - 1)),
sysconf(_SC_PAGE_SIZE),
PROT_READ|PROT_WRITE|PROT_EXEC);
int (*ptr)(int) = buf_ptr;
printf("%d\n", ptr(3));
return 0;
}
This code will throw a segfault, unless i'll change the foo function to:
int foo(int x)
{
//Anything but calling another function.
x = 4;
return x;
}
NOTE:
The code successfully copies foo into the buffer, i know i made some assumptions, but on my platform they're ok.

Your code is not position independent and even if it were, you don't have the correct relocations to move it to an arbitrary position. Your call to printf (or any other function) will be done with pc-relative addressing (through the PLT, but that's besides the point here). This means that the instruction generated to call printf isn't a call to a static address but rather "call the function X bytes from the current instruction pointer". Since you moved the code the call is done to a bad address. (I'm assuming i386 or amd64 here, but generally it's a safe assumption, people who are on weird platforms usually mention that).
More specifically, x86 has two different instructions for function calls. One is a call relative to the instruction pointer which determines the destination of the function call by adding a value to the current instruction pointer. This is the most commonly used function call. The second instruction is a call to a pointer inside a register or memory location. This is much less commonly used by compilers because it requires more memory indirections and stalls the pipeline. The way shared libraries are implemented (your call to printf will actually go to a shared library) is that for every function call you make outside of your own code the compiler will insert fake functions near your code (this is the PLT I mentioned above). Your code does a normal pc-relative call to this fake function and the fake function will find the real address to printf and call that. It doesn't really matter though. Almost any normal function call you make will be pc-relative and will fail. Your only hope in code like this are function pointers.
You might also run into some restrictions on executable mprotect. Check the return value of mprotect, on my system your code doesn't work for one more reason: mprotect doesn't allow me to do this. Probably because the backend memory allocator of malloc has additional restrictions that prevents executable protections of its memory. Which leads me to the next point:
You will break things by calling mprotect on memory that isn't managed by you. That includes memory you got from malloc. You should only mprotect things you've gotten from the kernel yourself through mmap.
Here's a version that demonstrates how to make this work (on my system):
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <string.h>
#include <err.h>
int
foo(int x, int (*fn)(const char *, ...))
{
fn("%d\n", x);
return 42;
}
int
bar(int x)
{
return 0;
}
int
main(int argc, char **argv)
{
size_t foo_size = (char *)bar - (char *)foo;
int ps = getpagesize();
void *buf_ptr = mmap(NULL, ps, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_ANON|MAP_PRIVATE, -1, 0);
if (buf_ptr == MAP_FAILED)
err(1, "mmap");
memcpy(buf_ptr, foo, foo_size);
int (*ptr)(int, int (*)(const char *, ...)) = buf_ptr;
printf("%d\n", ptr(3, printf));
return 0;
}
Here, I abuse the knowledge of how the compiler will generate the code for the function call. By using a function pointer I force it to generate a call instruction that isn't pc-relative. Also, I manage the memory allocation myself so that we get the right permissions from start and not run into any restrictions that brk might have. As a bonus we do error handling that actually helped me find a bug in the first version of this experiment and I also corrected other minor bugs (like missing includes) which allowed me to enable warnings in the compiler and catch another potential problem.
If you want to dig deeper into this you can do something like this. I added two versions of the function:
int
oldfoo(int x)
{
printf("%d\n", x);
return 42;
}
int
foo(int x, int (*fn)(const char *, ...))
{
fn("%d\n", x);
return 42;
}
Compile the whole thing and disassemble it:
$ cc -Wall -o foo foo.c
$ objdump -S foo | less
We can now look at the two generated functions:
0000000000400680 <oldfoo>:
400680: 55 push %rbp
400681: 48 89 e5 mov %rsp,%rbp
400684: 48 83 ec 10 sub $0x10,%rsp
400688: 89 7d fc mov %edi,-0x4(%rbp)
40068b: 8b 45 fc mov -0x4(%rbp),%eax
40068e: 89 c6 mov %eax,%esi
400690: bf 30 08 40 00 mov $0x400830,%edi
400695: b8 00 00 00 00 mov $0x0,%eax
40069a: e8 91 fe ff ff callq 400530 <printf#plt>
40069f: b8 2a 00 00 00 mov $0x2a,%eax
4006a4: c9 leaveq
4006a5: c3 retq
00000000004006a6 <foo>:
4006a6: 55 push %rbp
4006a7: 48 89 e5 mov %rsp,%rbp
4006aa: 48 83 ec 10 sub $0x10,%rsp
4006ae: 89 7d fc mov %edi,-0x4(%rbp)
4006b1: 48 89 75 f0 mov %rsi,-0x10(%rbp)
4006b5: 8b 45 fc mov -0x4(%rbp),%eax
4006b8: 48 8b 55 f0 mov -0x10(%rbp),%rdx
4006bc: 89 c6 mov %eax,%esi
4006be: bf 30 08 40 00 mov $0x400830,%edi
4006c3: b8 00 00 00 00 mov $0x0,%eax
4006c8: ff d2 callq *%rdx
4006ca: b8 2a 00 00 00 mov $0x2a,%eax
4006cf: c9 leaveq
4006d0: c3 retq
The instruction for the function call in the printf case is "e8 91 fe ff ff". This is a pc-relative function call. 0xfffffe91 bytes in front of our instruction pointer. It's treated as a signed 32 bit value, and the instruction pointer used in the calculation is the address of the next instruction. So 0x40069f (next instruction) - 0x16f (0xfffffe91 in front is 0x16f bytes behind with signed math) gives us the address 0x400530, and looking at the disassembled code I find this at the address:
0000000000400530 <printf#plt>:
400530: ff 25 ea 0a 20 00 jmpq *0x200aea(%rip) # 601020 <_GLOBAL_OFFSET_TABLE_+0x20>
400536: 68 01 00 00 00 pushq $0x1
40053b: e9 d0 ff ff ff jmpq 400510 <_init+0x28>
This is the magic "fake function" I mentioned earlier. Let's not get into how this works. It's necessary for shared libraries to work and that's all we need to know for now.
The second function generates the function call instruction "ff d2". This means "call the function at the address stored inside the rdx register". No pc-relative addressing and that's why it works.

The compiler is free to generate the code the way it wants provided the observable results are correct (as if rule). So what you do is just an undefined behaviour invocation.
Visual Studio sometimes uses relays. That means that the address of a function just points to a relative jump. That's perfectly allowed per standard because of the as is rule but it would definitely break that kind of construction. Another possibility is to have local internal functions called with relative jumps but outside of the function itself. In that case, your code would not copy them, and the relative calls will just point to random memory. That means that with different compilers (or even different compilation options on same compiler) it could give expected result, crash, or directly end the program without error which is exactly UB.

I think I can explain a bit. First of all, if both your functions have no return statement within, an undefined behaviour is invoked as per standard §6.9.1/12. Secondly, which is most common on a lot of platforms, and yours apparently as well, is the following: relative addresses of functions are hardcoded into binary code of functions. That means, that if you have a call of "printf" within "foo" and then you move (e.g. execute) from another location, that address, from which "printf" should be called, turns bad.

Generating simple shell binary code to be copied to the stack for stack overflow

I am trying to implement the buffer overflow attack, but I need to generate instruction code in binary so I can put it on the stack to be executed. My problem is that the instructions that I am getting now have jumps to different parts of a program which becomes hard to put on the stack. So I have this simple piece of code (not the code to be exploited) that I want to put on the stack to spawn a new shell.
#include <stdio.h>
int main( ) {
char *buf[2];
buf[0] = "/bin/bash";
buf[1] = NULL;
execve(buf[0], buf, NULL);
}
The code is being compiled with gcc with the following flags:
CFLAGS = -Wall -Wextra -g -fno-stack-protector -m32 -z execstack
LDFLAGS = -fno-stack-protector -m32 -z execstack
Finally using objdump -d -S, I get the following code (parts of it) in hex:
....
....
08048320 <execve#plt>:
8048320: ff 25 08 a0 04 08 jmp *0x804a008
8048326: 68 10 00 00 00 push $0x10
804832b: e9 c0 ff ff ff jmp 80482f0 <_init+0x3c>
....
....
int main( ) {
80483e4: 55 push %ebp
80483e5: 89 e5 mov %esp,%ebp
80483e7: 83 e4 f0 and $0xfffffff0,%esp
80483ea: 83 ec 20 sub $0x20,%esp
char *buf[2];
buf[0] = "/bin/bash";
80483ed: c7 44 24 18 f0 84 04 movl $0x80484f0,0x18(%esp)
80483f4: 08
buf[1] = NULL;
80483f5: c7 44 24 1c 00 00 00 movl $0x0,0x1c(%esp)
80483fc: 00
execve(buf[0], buf, NULL);
80483fd: 8b 44 24 18 mov 0x18(%esp),%eax
8048401: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
8048408: 00
8048409: 8d 54 24 18 lea 0x18(%esp),%edx
804840d: 89 54 24 04 mov %edx,0x4(%esp)
8048411: 89 04 24 mov %eax,(%esp)
8048414: e8 07 ff ff ff call 8048320 <execve#plt>
}
8048419: c9 leave
804841a: c3 ret
804841b: 90 nop
As you can see this code is hard to copy onto the stack. execve jumps to a different part of the assembly code to be executed. Is there a way to nicely get a program which can be put compactly on the stack without too much space and branches being used?

If you want a clean assembly code without coding it yourself, do the following:
Move your code to a separate function
Pass the -O0 compilation flag to gcc in order to prevent optimizations
Compile just an object file - use gcc [input file] -o [output file]
Follow these steps, and use the assembly code generated from objdump.
Please keep in mind, you have an external dependency for execve:
8048414: e8 07 ff ff ff call 8048320 <execve#plt>
so you must either include its implementation explicitly and remove the call, or know in advance that the process you want to attack has this function in its address space, and modify the call address to match the process execve address.

Welcome to Stack Overflow (pun intended).
This is an unusual request. Are you trying to get hired by the NSA?
Compilers don't structure assembly code in a very human friendly form, obviously. And their idea of optimization might be for performance rather than for compactness. Therefore, you might consider hand-coding it in assembler, using the compiler output as a guideline to achieve the effect you're going for.
What you have there may not be the best representation of what the compiler can do to inform your investigation in any case. Put the code into a function other than main, so you can just get the minimal stack setup and argument handling necessary, and try compiling it with all the different optimization levels to see what it does.
You're probably getting some extra setup overhead for putting it in main(), because it's the main entry point to your program and has to interface with libc and the OS (just guessing), and may be setting things up for the program's operational context in general (which you could presume would have been done for whichever executable you were inserting the code into so it would be redundant).

Where does constant local variable array go in memory for a 'C' program

I am using GCC 4.8.1 and it doesn't seem to store const variable local to main in DATA segment. Below is code and memory map for 3 such programs:
Code 1:
int main(void)
{ //char a[10]="HELLO"; //1 //const char a[10] = "HELLO"; //2
return 0;
}
MEMORY MAP FOR ABOVE:
text data bss dec hex filename
7264 1688 1040 9992 2708 a.exe
CODE 2:
int main(void)
{
char a[10]="HELLO";
//const char a[10] = "HELLO";
return 0;
}
MEMORY MAP FOR 2:
text data bss dec hex filename
7280 1688 1040 10008 2718 a.exe
CODE 3:
int main(void)
{
//char a[10]="HELLO";
const char a[10] = "HELLO";
return 0;
}
MEMORY MAP FOR 3 :
text data bss dec hex filename
7280 1688 1040 10008 2718 a.exe
I do not see any difference in data segment between 3 codes. Can someone please explain this result to me.
Thanks in anticipation!

If your array is not used by your program so the compiler is allowed to simply optimize out the object.
From the C Standard:
(C99, 5.1.2.3p1) "The semantic descriptions in this International Standard describe the behavior of an abstract machine in which issues of optimization are irrelevant"
and
(C99, 5.1.2.3p3) "In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object)."
If you compile your program #3 with optimizations disabled (-O0) or depending on your compiler the object can still be allocated. In your case it does not appear in the data and rodata section but in text section thus the text section increase.
For example in your third example, in my compiler the resulting code with -O0 is (dumped using objdump -d):
00000000004004d0 <main>:
4004d0: 55 push %rbp
4004d1: 48 89 e5 mov %rsp,%rbp
4004d4: 48 b8 48 45 4c 4c 4f mov $0x4f4c4c4548,%rax
4004db: 00 00 00
4004de: 48 89 45 f0 mov %rax,-0x10(%rbp)
4004e2: 66 c7 45 f8 00 00 movw $0x0,-0x8(%rbp)
4004e8: b8 00 00 00 00 mov $0x0,%eax
4004ed: 5d pop %rbp
4004ee: c3 retq
4004ef: 90 nop
0x4f4c4c4548 is the ASCII characters of your string moved in a register and then pushed in the stack.
If I compile the same program with -O3, the output is simply:
00000000004004d0 <main>:
4004d0: 31 c0 xor %eax,%eax
4004d2: c3 retq
4004d3: 90 nop
and the string does not appear in data or rodata, it is simply optimized out.

This is what should happen:
Code 1: nothing is stored anywhere.
Code 2: a is stored on the stack. It is not stored in .data.
Code 3 a is either stored on the stack or in .rodata, depending on whether it is initialized with a constant expression or not. The optimizer might also decide to store it in .text (together with the code).
I do not see any difference in data segment between 3 codes.
That's because there should be no difference. .data is used for non-constant variables with static storage duration that are initialized to a value other than zero.

Why does my program not overflow the stack when I allocate a 11MB char array while the stack upper limit is 10MB?

I have two simple C++ programs and two questions here. I'm working in CentOS 5.2 and my dev environment is as follows:
g++ (GCC) 4.1.2 20080704 (Red Hat 4.1.2-50)
"ulimit -s" output: 10240 (kbytes), that is, 10MB
Program #1:
main.cpp:
int main(int argc, char * argv[])
{
char buf[1024*1024*11] = {0};
return 0;
}
(Compiled with "g++ -g main.cpp")
The program allocates 1024*1024*11 bytes(that is, 11MB) on the stack but it doesn't crash. After I change the allocation size to 1024*1024*12(that is, 12MB), the program crashes. I think this should be caused by a stack overflow. But Why does the program not crash when the allocation size is 11MB, which is also greater than the 10MB-upper-limit??
Program #2:
main.cpp:
#include <iostream>
int main(int argc, char * argv[])
{
char buf[1024*1024*11] = {0};
std::cout << "*** separation ***" << std::endl;
char buf2[1024*1024] = {0};
return 0;
}
(Compiled with "g++ -g main.cpp")
This program would result in a program crash because it allocates 12MB bytes on the stack. However, according to the core dump file(see below) the crash occurs on the buf but not buf2. Shouldn't the crash happen to buf2 because we know from program #1 that the allocation of char buf[1024*1024*11] is OK thus after we allocate another 1024*1024 bytes the stack would overflow?
I think there must be some quite fundamental concepts that I didn't build a solid understanding. But what are they??
Appendix: The core-dump info generated by program #2:
Core was generated by `./a.out'.
Program terminated with signal 11, Segmentation fault.
[New process 16433]
#0 0x08048715 in main () at main.cpp:5
5 char buf[1024*1024*11] = {0};

You're wrongly assuming the stack allocations happens where they appear in your code. Anytime you have local variables whose size are known at compile time, space for those will be allocated together when the function is entered. Only dynamic sized local variables are allocated later (VLAs and alloca).
Furthermore the error happens as soon as you write to the memory, not when it's first allocated. Most likely buf is located before buf2 on the stack and the overflow thus happens in buf, not buf2.

To analyze these kind of mysteries, it is always useful to look at the generated code. My guess is that your particular compiler version is doing something different, because mine segfaults with -O0, but not with -O1.
From your program #1, with g++ a.c -g -O0, and then objdump -S a.out
int main(int argc, char * argv[])
{
8048484: 55 push %ebp
8048485: 89 e5 mov %esp,%ebp
This is the standard stack frame. Nothing to see here.
8048487: 83 e4 f0 and $0xfffffff0,%esp
Align the stack to multiple of 16, just in case.
804848a: 81 ec 30 00 b0 00 sub $0xb00030,%esp
Allocate 0xB00030 bytes of stack space. That is 1024*1024*11 + 48 bytes. No access to the memory yet, so no exception. The extra 48 bytes are of internal use of the compiler.
8048490: 8b 45 0c mov 0xc(%ebp),%eax
8048493: 89 44 24 1c mov %eax,0x1c(%esp) <--- SEGFAULTS
The first time the stack is accessed is beyond the ulimit, so it segfaults.
8048497: 65 a1 14 00 00 00 mov %gs:0x14,%eax
Thiis is the stack-protector.
804849d: 89 84 24 2c 00 b0 00 mov %eax,0xb0002c(%esp)
80484a4: 31 c0 xor %eax,%eax
char buf[1024*1024*11] = {0};
80484a6: 8d 44 24 2c lea 0x2c(%esp),%eax
80484aa: ba 00 00 b0 00 mov $0xb00000,%edx
80484af: 89 54 24 08 mov %edx,0x8(%esp)
80484b3: c7 44 24 04 00 00 00 movl $0x0,0x4(%esp)
80484ba: 00
80484bb: 89 04 24 mov %eax,(%esp)
80484be: e8 d1 fe ff ff call 8048394 <memset#plt>
Initialize the array, calling memset
return 0;
80484c3: b8 00 00 00 00 mov $0x0,%eax
}
As you can see, the segfault happens when the internal variables are accessed, because they happen to be below the big array (they have to be, because there is the stack protector, to detect stack smashing).
If you compile with optimizations, the compiler notices that you do nothing useful with the array and optimizes it out. So no sigseg.
Probably your version of GCC is a bit oversmart in non-optimization mode, and removes the array. We can analyze it further if you post the output of objdump -S a.out.

When defining local variables on the stack, there's no real allocation of memory like it is done in the heap. The stack memory allocation consists more simply in changing the address of the stack pointer (that is going to be used by called functions) to reserve the wanted memory.
I suspect that this operation of changing the stack pointer is done only once, at the beginning of the function, to reserve space for all the used local variable (by oposition of changing it once per local variable). This explains why the error on your program #2 occurs on the first allocation.

Both of your programs should ideally give segfault.
Normally whenever a function is entered, all the variables defined in it are allocated memory onto the stack. This said, it however also depends on the optimization level with which the code is compiled.
Optimization level zero, indicated during compilation as -O0 indicates no optimzation at all. It is also the default optimization level with which a code is compiled. The above mentioned programs when compiled with -O0, gives segfault.
However, when you compile the programs using higher optimization levels, the compiler notices that the variable which is defined is not used in the function. It therefore removes the variable definition from the assembly language code. As a result, your programs won't give any segfault.