Why Linux does not crash but output an random string? - c

char* getChar()
{
//char* pStr = "TEST!!!";
char str[10] = "TEST!!!";
return str;
}
int main(int argc, char *argv[])
{
double *XX[2];
printf("STR is %s.\n", getChar());
return (0);
}
I know a temporary variable in a stack SHOULD not be returned.
Actually it will output a undecided string.
When does Linux crash except NULL-Pointer-Reference?

You've got some undefined behavior. Read also this answer to have an idea of what that could mean.
If you wish an explanation, you need to dive into implementation specific details.
Here it goes....
This carp.c file (very similar to yours, I renamed getChar to carp and included <stdio.h>)
#include <stdio.h>
char *carp() {
char str[10] = "TEST!!!";
return str;
}
int main(int argc, char**argv)
{
printf("STR is %s.\n", carp());
return 0;
}
is compiled by gcc -O -fverbose-asm -S carp.c into a good warning
carp.c: In function 'carp':
carp.c:4:8: warning: function returns address of local variable [-Wreturn-local-addr]
return str;
^
and into this assembler code (GCC 4.9.1 on Debian/Sid/x86-64)
.text
.Ltext0:
.globl carp
.type carp, #function
carp:
.LFB11:
.file 1 "carp.c"
.loc 1 2 0
.cfi_startproc
.loc 1 5 0
leaq -16(%rsp), %rax #, tmp85
ret
.cfi_endproc
.LFE11:
.size carp, .-carp
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "STR is %s.\n"
.text
.globl main
.type main, #function
main:
.LFB12:
.loc 1 7 0
.cfi_startproc
.LVL0:
subq $24, %rsp #,
.cfi_def_cfa_offset 32
.loc 1 8 0
movq %rsp, %rsi #,
.LVL1:
movl $.LC0, %edi #,
.LVL2:
movl $0, %eax #,
call printf #
.LVL3:
.loc 1 10 0
movl $0, %eax #,
addq $24, %rsp #,
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE12:
.size main, .-main
As you notice, the bad carp function is returning the stack pointer minus 16 bytes. And main would print what happens to be there. And what happens to be at that location probably depends upon a lot of factors (your environment environ(7), the ASLR used for the stack, etc....). If you are interested in understanding what exactly is the memory (and address space) at entry into main, dive into execve(2), ld.so(8), your compiler's crt0, your kernel's source code, your dynamic linker source code, your libc source code, the x86-64 ABI, etc.... My life is too short to take many hours to explain all of this.
BTW, notice that the initialization of local str to "TEST!!!" has been rightly optimized out by my compiler.
Read also signal(7): your process can be terminated in many cases (I won't call that "Linux crashing" like you do), e.g. when dereferencing a pointer out of its address space in virtual memory (see also this), executing a bad machine code, etc...

It didn't crash because you got lucky; since you have no way of knowing just how long this string that gets printed is, you have no idea what parts of memory it will venture into.

Related

Understanding argc and argv in main() assembly [duplicate]

This question already has answers here:
gcc argument register spilling on x86-64
(2 answers)
Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?
(1 answer)
x86 explanation, number of function arguments and local variables
(2 answers)
Closed 2 years ago.
I have the following C program to see how the main function is called with argc and argv as follows:
#include <stdio.h>
int main(int argc, char *argv[]) {
// use a non-zero value so we can easily tell it ran properly with echo $?
return 3;
}
And the non-optimized assembly output with $ gcc ifile.c -S -o ifile.s gives us:
.file "ifile.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl %edi, -4(%rbp) <== here
movq %rsi, -16(%rbp) <== here
movl $3, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
.section .note.GNU-stack,"",#progbits
I understand this with the exception of the two lines above preceding moving the return value into %eax:
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
What are these two lines doing? I am guessing the first line since it has an offset of 4 is populating the integer value of argc, and the second argument is passing an (8-byte) pointer padded to 16 for the strings that can be passed in the argv. Is this a correct understanding of these items? Where can I learn more about, not so much the full ABI, but the specific details/internals about how the main() function gets invoked and such?

How does x=x+1 is evaluated by the compiler and how is represented in assembly?

I'm trying to understand how does the compiler "sees" the i+1 part from expression i=i+1. I understand that i=3 means putting the value 3 in the location memory of variable i.
My guess about the i=i+1 is that the compiler expects a value from the right side of the "=" operator, so it gets the value from the location memory of variable i (which is 3, after the assignment) and add 1 to it, and the final result of the "i+1" expression(3+1=4) is stored back into the location memory of variable i, as a value. Is that correct?
And if it is, it means that any variable/combination of variables and literals present on the right side of an "=" operator will always be replaced with the value stored in them and those value can be added/substracted/etc with the values from other variables/literals (as in the x+1 expression), whilst the final result of those calculations will also be literal values (ex: 5, literal strings, etc), and will also be stored like values in a single variable on the left side of the "=" operator.
I'm also curious how this code is seen in assembly, and what are the main operations of this incrementation of i ( i = i+1);
#include <stdio.h>
int main()
{
int i = 3;
i = i + 1; // i should have the value of 4 stored back in it;
return 0;
}
This is not answerable for the general case. It depends on the target platform. If you want to inspect the assembly, you can do so with the -S parameter with gcc. When I did that to your code, it gave me this:
/tmp$ cat main.s
.file "main.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $3, -4(%rbp)
addl $1, -4(%rbp)
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Debian 9.2.1-8) 9.2.1 20190909"
.section .note.GNU-stack,"",#progbits
A brief little explanation of what is happening here. First we push the value of the stackpointer. This is so that we can jump back later.
.cfi_startproc
pushq %rbp
Then we set up the stack frame with this code. It corresponds to declaring variables.
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
Then we have this. Comments are mine.
movl $3, -4(%rbp) # i = 3;
addl $1, -4(%rbp) # i = i + 1;
Lastly, we return from the main function
movl $0, %eax # Store 0 in the "return register"
popq %rbp # Restore stackpointer
.cfi_def_cfa 7, 8
ret # return
Note that there is not a 1-1 relationship between lines. Not even for very simple lines.
Please also note that C imposes requirement on the observable behavior of the program and not on the generated assembly. So for instance, a compiler might remove the whole body for the main function because the variable i is not used in an observable way. And it will if you use optimization. When I recompiled your code with -O3 I got this instead:
/tmp/$ cat main.s
.file "main.c"
.text
.section .text.startup,"ax",#progbits
.p2align 4
.globl main
.type main, #function
main:
.LFB11:
.cfi_startproc
xorl %eax, %eax
ret
.cfi_endproc
.LFE11:
.size main, .-main
.ident "GCC: (Debian 9.2.1-8) 9.2.1 20190909"
.section .note.GNU-stack,"",#progbits
Notice how much that got removed from main. It can be interesting that movl $0, %eax has changed to xorl %eax, %eax. If you think about it, it's pretty obvious that this is a "set zero" operation. One could reasonably argue why anyone would write stuff like that. Well, the optimizer does certainly not optimize for readability. There are a few reasons why it is better. You can read about them here: What is the best way to set a register to zero in x86 assembly: xor, mov or and?

In which data segment is the C string stored?

I'm wondering what's the difference between char s[] = "hello" and char *s = "hello".
After reading this and this, I'm still not very clear on this question.
As I know, there are five data segments in memory, Text, BSS, Data, Stack and Heap.
From my understanding,
in case of char s[] = "hello":
"hello" is in Text.
s is in Data if it is a global variable or in Stack if it is a local variable.
We also have a copy of "hello" where the s is stored, so we can modify the value of this string via s.
in case of char *s = "hello":
"hello" is in Text.
s is in Data if it is a global variable or in Stack if it is a local variable.
s just points to "hello" in Text and we don't have a copy of it, therefore modifying the value of string via this pointer should cause "Segmentation Fault".
Am I right?
You are right that "hello" for the first case is mutable and for the second case is immutable string. And they are kept in read-only memory before initialization.
In the first case the mutable memory is initialized/copied from immutable string. In the second case the pointer refers to immutable string.
For first case wikipedia says,
The values for these variables are initially stored within the
read-only memory (typically within .text) and are copied into the
.data segment during the start-up routine of the program.
Let us examine segment.c file.
char*s = "hello"; // string
char sar[] = "hello"; // string array
char content[32];
int main(int argc, char*argv[]) {
char psar[] = "parhello"; // local/private string array
char*ps = "phello"; // private string
content[0] = 1;
sar[3] = 1; // OK
// sar++; // not allowed
// s[2] = 1; // segmentation fault
s = sar;
s[2] = 1; // OK
psar[3] = 1; // OK
// ps[2] = 1; // segmentation fault
ps = psar;
ps[2] = 1; // OK
return 0;
}
Here is the assembly generated for segment.c file. Note that both s and sar is in global aka .data segment. It seems sar is const pointer to a mutable initialized memory or not pointer at all(practically it is an array). And eventually it has an implication that sizeof(sar) = 6 is different to sizeof(s) = 8. There are "hello" and "phello" in readonly(.rodata) section and effectively immutable.
.file "segment.c"
.globl s
.section .rodata
.LC0:
.string "hello"
.data
.align 8
.type s, #object
.size s, 8
s:
.quad .LC0
.globl sar
.type sar, #object
.size sar, 6
sar:
.string "hello"
.comm content,32,32
.section .rodata
.LC1:
.string "phello"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $64, %rsp
movl %edi, -52(%rbp)
movq %rsi, -64(%rbp)
movq %fs:40, %rax
movq %rax, -8(%rbp)
xorl %eax, %eax
movl $1752326512, -32(%rbp)
movl $1869376613, -28(%rbp)
movb $0, -24(%rbp)
movq $.LC1, -40(%rbp)
movb $1, content(%rip)
movb $1, sar+3(%rip)
movq $sar, s(%rip)
movq s(%rip), %rax
addq $2, %rax
movb $1, (%rax)
movb $1, -29(%rbp)
leaq -32(%rbp), %rax
movq %rax, -40(%rbp)
movq -40(%rbp), %rax
addq $2, %rax
movb $1, (%rax)
movl $0, %eax
movq -8(%rbp), %rdx
xorq %fs:40, %rdx
je .L2
call __stack_chk_fail
.L2:
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",#progbits
Again for local variable in main, the compiler does not bother to create a name. And it may keep it in register or in stack memory.
Note that local variable value "parhello" is optimized into 1752326512 and 1869376613 numbers. I discovered it by changing the value of "parhello" to "parhellp". The diff of the assembly output is as follows,
39c39
< movl $1886153829, -28(%rbp)
---
> movl $1869376613, -28(%rbp)
So there is no separate immutable store for psar . It is turned into integers in the code segment.
answer to your first question:
char s[] = "hello";
s is an array of type char. An array is a const pointer, meaning that you cannot change the s using pointer arithmetic (i.e. s++). The data aren't const, though, so you can change it.
See this example C code:
#include <stdio.h>
void reverse(char *p){
char c;
char* q = p;
while (*q) q++;
q--; // point to the end
while (p < q) {
c = *p;
*p++ = *q;
*q-- = c;
}
}
int main(){
char s[] = "DCBA";
reverse( s);
printf("%s\n", s); // ABCD
}
which reverses the text "DCBA" and produces "ABCD".
char *p = "hello"
p is a pointer to a char. You can do pointer arithmetic -- p++ will compile -- and puts data in read-only parts of the memory (const data).
and using p[0]='a'; will result to runtime error:
#include <stdio.h>
int main(){
char* s = "DCBA";
s[0]='D'; // compile ok but runtime error
printf("%s\n", s); // ABCD
}
this compiles, but not runs.
const char* const s = "DCBA";
With a const char* const, you can change neither s nor the data content which point to (i.e. "DCBE"). so data and pointer are const:
#include <stdio.h>
int main(){
const char* const s = "DCBA";
s[0]='D'; // compile error
printf("%s\n", s); // ABCD
}
The Text segment is normally the segment where your code is stored and is const; i.e. unchangeable. In embedded systems, this is the ROM, PROM, or flash memory; in a desktop computer, it can be in RAM.
The Stack is RAM memory used for local variables in functions.
The Heap is RAM memory used for global variables and heap-initialized data.
BSS contains all global variables and static variables that are initialized to zero or not initialized vars.
For more information, see the relevant Wikipedia and this relevant Stack Overflow question
With regards to s itself: The compiler decides where to put it (in stack space or CPU registers).
For more information about memory protection and access violations or segmentation faults, see the relevant Wikipedia page
This is a very broad topic, and ultimately the exact answers depend on your hardware and compiler.

Can I list the local variables in a stack by using some system functions in C language

The C function backtrace just returns a series of functions calls for the programn, but i want to list all the locals variables in my programn, just like the info locals in gdb.Any idea if this can be done? Thanks
Generally, no. You should move away from thinking about a "stack" as some sort of god given factum. A call stack is merely a common implementation technique for C. It has no intrinsic meaning or required semantics. Automatic variables ("local variables", as you say) have to behave in a certain way, and sometimes that means that they are written onto the call stack. However, it is entirely conceivable that local variables are never realized in memory at all -- they may instead only ever be stored in a processor register, or eliminated entirely if an equivalent program can be formulated without them.
So, no, there is no language-intrinsic mechanism for enumerating local variables. As you say, the debugger can do so to some extent (depending on debug symbols being present and subject to optimizations); perhaps you can find a library that can process debug symbols from within a running program.
If this is just for occasional debugging, then you can invoke the debugger. However, since the debugger itself will freeze your program, you need an intermediary to capture the output. You can, for example, use system, and redirect the output to a file, then read the file afterwards. In the example below, the file gdbcmds.txt contains the line info locals.
char buf[512];
FILE *gdb;
snprintf(buf, sizeof(buf), "gdb -batch -x gdbcmds.txt -p %d > gdbout.txt",
(int)getpid());
system(buf);
gdb = fopen("gdbout.txt", "r");
while (fgets(buf, sizeof(buf), gdb) != 0) {
printf("%s", buf);
}
fclose(gdb);
First, note that backtrace is not a standard C library function, but a GNU-specific extension.
In general, it's difficult to impossible retrieve local variable information from compiled code, especially if it was compiled without debugging or with optimization enabled. If debugging isn't turned on, variable names and types are generally not preserved in the resulting machine code.
For example, take the following ridiculously simple code:
#include <stdio.h>
#include <math.h>
int main(void)
{
int x = 1, y = 2, z;
z = 2 * y - x;
printf("x = %d, y = %d, z = %d\n", x, y, z);
return 0;
}
Here's the resulting machine code, no debugging or optimization:
.file "varinfo.c"
.version "01.01"
gcc2_compiled.:
.section .rodata
.LC0:
.string "x = %d, y = %d, z = %d\n"
.text
.align 4
.globl main
.type main,#function
main:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $1, -4(%ebp)
movl $2, -8(%ebp)
movl -8(%ebp), %eax
movl %eax, %eax
sall $1, %eax
subl -4(%ebp), %eax
movl %eax, -12(%ebp)
pushl -12(%ebp)
pushl -8(%ebp)
pushl -4(%ebp)
pushl $.LC0
call printf
addl $16, %esp
movl $0, %eax
leave
ret
.Lfe1:
.size main,.Lfe1-main
.ident "GCC: (GNU) 2.96 20000731 (Red Hat Linux 7.2 2.96-112.7.2)"
x, y, and z are referred to through -4(%ebp), -8(%ebp), and -12(%ebp) respectively. There's nothing to indicate that they're integers other than the instructions used to perform the arithmetic.
It's even better with optimization (-O1) turned on:
.file "varinfo.c"
.version "01.01"
gcc2_compiled.:
.section .rodata.str1.1,"ams",#progbits,1
.LC0:
.string "x = %d, y = %d, z = %d\n"
.text
.align 4
.globl main
.type main,#function
main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
pushl $3
pushl $2
pushl $1
pushl $.LC0
call printf
movl $0, %eax
leave
ret
.Lfe1:
.size main,.Lfe1-main
.ident "GCC: (GNU) 2.96 20000731 (Red Hat Linux 7.2 2.96-112.7.2)"
In this case, the compiler was able to do some static analysis and compute the value z at compile time; there's no need to set aside any memory for any of the variables at all, because the compiler already knows what those values have to be.

Stack View when printf is called?

I was just learning about format string vulnerabilities that makes me ask this question
Consider the following simple program:
#include<stdio.h>
void main(int argc, char **argv)
{
char *s="SomeString";
printf(argv[1]);
}
Now clearly, this code is vulnerable to a format String vulnerability. I.e. when the command line argument is %s, then the value SomeString is printed since printf pops the stack once.
What I dont understand is the structure of the stack when printf is called
In my head I imagine the stack to be as follows:
grows from left to right ----->
main() ---> printf()-->
RET to libc_main | address of 's' | current registers| ret ptr to main | ptr to format string|
if this is the case, how does inputting %s to the program, cause the value of s to be popped ?
(OR) If I am totally wrong about the stack structure , please correct me
The stack contents depends a lot on the following:
the CPU
the compiler
the calling conventions (i.e. how parameters are passed in the registers and on the stack)
the code optimizations performed by the compiler
This is what I get by compiling your tiny program with x86 mingw using gcc stk.c -S -o stk.s:
.file "stk.c"
.def ___main; .scl 2; .type 32; .endef
.section .rdata,"dr"
LC0:
.ascii "SomeString\0"
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB6:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $32, %esp
call ___main
movl $LC0, 28(%esp)
movl 12(%ebp), %eax
addl $4, %eax
movl (%eax), %eax
movl %eax, (%esp)
call _printf
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE6:
.def _printf; .scl 2; .type 32; .endef
And this is what I get using gcc stk.c -S -O2 -o stk.s, that is, with optimizations enabled:
.file "stk.c"
.def ___main; .scl 2; .type 32; .endef
.section .text.startup,"x"
.p2align 2,,3
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB7:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
call ___main
movl 12(%ebp), %eax
movl 4(%eax), %eax
movl %eax, (%esp)
call _printf
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE7:
.def _printf; .scl 2; .type 32; .endef
As you can see, in the latter case there's no pointer to "SomeString" on the stack. In fact, the string isn't even present in the compiled code.
In this simple code there are no registers saved on the stack because there aren't any variables allocated to registers that need to be preserved across the call to printf().
So, the only things you get on the stack here are the string pointer (optionally), unused space due to stack alignment (andl $-16, %esp + subl $32, %esp align the stack and allocate space for local variables, none here), the printf()'s parameter, the return address for returning from printf() back to main().
In the former case the pointer to "SomeString" and the printf()'s parameter (value of argv[1]) are quite far away from one another:
movl $LC0, 28(%esp) ; address of "SomeString" is at esp+28
movl 12(%ebp), %eax
addl $4, %eax
movl (%eax), %eax
movl %eax, (%esp) ; address of a copy of argv[1] is at esp
call _printf
To make the two addresses stored one right after the other on the stack, if that's what you want, you'd need to play with the code, compilation/optimization options or use a different compiler.
Or you could supply a format string in argv[1] such that printf() would reach it. You could, for example, include a number of fake parameters in the format string.
For example, if I compile this piece of code using gcc stk.c -o stk.exe and run it as stk.exe %u%u%u%u%u%u%s, I'll get the following output from it:
4200532268676042006264200532880015253SomeString
All of this is pretty hacky and it's not trivial to make it work right.
On x86 the stack on a function call could look something like:
: :
+--------------+
: alignment :
+--------------+
12(%ebp) | arg2 |
+--------------+
8(%ebp) | arg1 |
+--------------+
4(%ebp) | ret | -----> return address
+--------------+
(%ebp) | ebp | -----> previous ebp value
+--------------+
-4(%ebp) | local1 | -----> local vars, sometimes they can overflow ;-)
+--------------+
: alignment :
+--------------+
: :
If you used -fomit-frame-pointer ebp would not be saved on the stack. At different optimization levels some variables may disappear (be optimized out), ...
Other ABIs store function arguments on registers, instead of saving them on the stack. Later, before calling another function, live registers may spill into the stack.

Resources