I am new to C, so forgive me if this query is basic.
I want to call main() from another function, and make the program run infinitely. The code is here:
#include <stdio.h>
void message();
int main()
{
message();
return 0;
}
void message()
{
printf("This is a test message. \n");
main();
}
I expect to see this program run infinitely. However, it runs for some time and then stops suddenly. Using a counter variable, which I printed alongside the test message, I found that the statement "This is a test message." is printed 174608 times after which I get an error message
Segmentation fault (core dumped)
and the program terminates. What does this error mean? And why does the program only run 174608 times (why not infinitely)?
You have stack overflow from infinite recursion. Make infinite loop in main:
int main()
{
while (1)
{
//...
}
}
The mutual recursion costs stack space. If you put the recursion in main() itself, the compiler may recognise the tail recursion, and replace it by iteration. [for fun and education, don't try this at home, children ...] :
#include <stdio.h>
void message();
int main()
{
message();
return main();
}
void message()
{
printf("This is a test message. \n");
}
GCC recognises the tail recursion in optimisation level=2 and above.
main.s output for gcc -O2 -S main.c:
.p2align 4,,15
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
.p2align 4,,7
.p2align 3
.L4:
call message
jmp .L4
.size main, .-main
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5.1) 4.4.3"
.section .note.GNU-stack,"",#progbits
This is not equivalent to while(1) {...} or for(;;) {...}, which give you infinite loops.
Every time a function(for example, main() or message()) is called, some values are pushed into the stack. When functions are called too many times, your stack get filled, and finally overflow, giving you a "stack overflow" error.
Note that this error has nothing to do with this site, although they happen to have the same name :)
Related
I am trying to write my own _start function using inline assembly. But when I try to read argc and argv from stack (%rsp and %rsp + 8) I get wrong values. I don't know what I am doing wrong.
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <syscall.h>
int main(int argc, char *argv[]) {
printf("%d\n", argc);
printf("%s\n", argv[0]);
printf("got here\n");
return 0;
}
void _start() {
__asm__(
"xor %rbp, %rbp;"
"movl (%rsp), %edi;"
"lea 8(%rsp), %rsi;"
"xor %rax, %rax;"
"call main"
...
Terminal:
$ gcc test.c -nostartfiles
$ ./a.out one two three
0
Segmentation fault (core dumped)
$
Any idea where my fault could be ?
I am using a Ubuntu 20.04 VM
This looks correct for a minimal _start: but you put it inside a non-naked C function. Compiler-generated code will run, e.g. push %rbp / mov %rsp, %rbp, before execution enters before the asm statement. To see this, look at gcc -S output, or single-step in a debugger such as GDB.
Put your asm statement at global scope (like in How Get arguments value using inline assembly in C without Glibc?) or use __attribute__((naked)) on your _start(). Note that _start isn't really a function
As a rule, never use GNU C Basic asm statements in a non-naked function. Although you might get this to work with -O3 because that would imply -fomit-frame-pointer so the stack would still be pointing at argc and argv when your code ran.
A dynamically linked executable on GNU/Linux will run libc startup code from dynamic linker hooks, so you actually can use printf from _start without manually calling those init functions. Unlike if this was statically linked.
However, your main tries to return to your _start, but you don't show _start calling exit. You should call exit instead of making an _exit system call directly, to make sure stdio buffers get flushed even if output is redirected to a file (making stdout full buffered). Falling off the end of _start would be bad, crashing or getting into an infinite loop depending on what execution falls in to.
I wrote a simple function in order to check if malloc works. I create 1 Gb array, fill it with numbers, but the heap does not seem to change. Here is the code:
#include <stdio.h>
#include <assert.h> // For assert()
#include <stdlib.h> // For malloc(), free() and realloc()
#include <unistd.h> // For sleep()
static void create_array_in_heap()
{
double* b;
b = (double*)malloc(sizeof(double) * 1024 * 1024 * 1024);
assert(b != NULL); // Check that the allocation succeeded
int i;
for (i=0; i<1024*1024*1024; i++);
b[i] = 1;
sleep(10);
free(b);
}
int main()
{
create_array_in_heap();
return 0;
}
screenshot of Linux' system monitor
Any ideas why ?
EDIT: a simpler explanation is given in the comments. But my answer applies once the ; has been removed.
An agressive optimizing compiler, such as Clang (Compiler Explorer link), can see that the only important part of your function create_array_in_heap is the call to sleep. The rest has no functional value, since you only fill a memory block to eventually discard it, and is removed by the compiler. This is the entirety of your program compiled by Clang 7.0.0 with -O2:
main: # #main
pushq %rax
movl $10, %edi
callq sleep
xorl %eax, %eax
popq %rcx
retq
In order to benchmark any aspect of a program, the program should have been designed to output a result (computing and discarding the result is too easy for the compiler to optimize into nothing). The result should also be computed from inputs that aren't known at compile-time, otherwise the computation always produces the same result and can be optimized by constant propagation.
I'm trying to compile and run following program without main() function in C. I have compiled my program using the following command.
gcc -nostartfiles nomain.c
And compiler gives warning
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000400340
Ok, No problem. then, I have run executable file(a.out), both printf statements print successfully, and then get segmentation fault.
So, my question is, Why segmentation fault after successfully execute print statements?
my code:
#include <stdio.h>
void nomain()
{
printf("Hello World...\n");
printf("Successfully run without main...\n");
}
output:
Hello World...
Successfully run without main...
Segmentation fault (core dumped)
Note:
Here, -nostartfiles gcc flag prevents the compiler from using standard startup files when linking
Let's have a look at the generated assembly of your program:
.LC0:
.string "Hello World..."
.LC1:
.string "Successfully run without main..."
nomain:
push rbp
mov rbp, rsp
mov edi, OFFSET FLAT:.LC0
call puts
mov edi, OFFSET FLAT:.LC1
call puts
nop
pop rbp
ret
Note the ret statement. Your program's entry point is determined to be nomain, all is fine with that. But once the function returns, it attempts to jump into an address on the call stack... that isn't populated. That's an illegal access and a segmentation fault follows.
A quick solution would be to call exit() at the end of your program (and assuming C11 we might as well mark the function as _Noreturn):
#include <stdio.h>
#include <stdlib.h>
_Noreturn void nomain(void)
{
printf("Hello World...\n");
printf("Successfully run without main...\n");
exit(0);
}
In fact, now your function behaves pretty much like a regular main function, since after returning from main, the exit function is called with main's return value.
In C, when functions/subroutines are called the stack is populated as (in the order):
The arguments,
Return address,
Local variables, --> top of the stack
main() being the start point, ELF structures the program in such a way that whatever instructions comes first would get pushed first, in this case printfs are.
Now, program is sort of truncated without return-address OR __end__ and infact it assumes that whatever is there on the stack at that(__end__) location is the return-address, but unfortunately its not and hence it crashes.
Can a C compiler ever optimize a loop by running it?
For example:
int num[] = {1, 2, 3, 4, 5}, i;
for(i = 0; i < sizeof(num)/sizeof(num[0]); i++) {
if(num[i] > 6) {
printf("Error in data\n");
exit(1);
}
}
Instead of running this each time the program is executed, can the compiler simply run this and optimize it away?
Let's have a look… (This really is the only way to tell.)
Fist, I've converted your snippet into something we can actually try to compile and run and saved it in a file named main.c.
#include <stdio.h>
static int
f()
{
const int num[] = {1, 2, 3, 4, 5};
int i;
for (i = 0; i < sizeof(num) / sizeof(num[0]); i++)
{
if (num[i] > 6)
{
printf("Error in data\n");
return 1;
}
}
return 0;
}
int
main()
{
return f();
}
Running gcc -S -O3 main.c produces the following assembly file (in main.s).
.file "main.c"
.section .text.unlikely,"ax",#progbits
.LCOLDB0:
.section .text.startup,"ax",#progbits
.LHOTB0:
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB22:
.cfi_startproc
xorl %eax, %eax
ret
.cfi_endproc
.LFE22:
.size main, .-main
.section .text.unlikely
.LCOLDE0:
.section .text.startup
.LHOTE0:
.ident "GCC: (GNU) 5.1.0"
.section .note.GNU-stack,"",#progbits
Even if you don't know assembly, you'll notice that the string "Error in data\n" is not present in the file so, apparently, some kind of optimization must have taken place.
If we look closer at the machine instructions generated for the main function,
xorl %eax, %eax
ret
We can see that all it does is XOR'ing the EAX register with itself (which always results in zero) and writing that value into EAX. Then it returns again. The EAX register is used to hold the return value. As we can see, the f function was completely optimized away.
Yes. The C compiler unrolls loops automatically with options -O3 and -Otime.
You didn't specify the compiler, but using gcc with -O3 and taking the size calculation outside the for maybe it could do a little adjustment.
Compilers can do even better than that. Not only can compilers examine the effect of running code "forward", but the Standard even allows them to work code logic in reverse in situations involving potential Undefined Behavior. For example, given:
#include <stdio.h>
int main(void)
{
int ch = getchar();
int q;
if (ch == 'Z')
q=5;
printf("You typed %c and the magic value is %d", ch, q);
return 0;
}
a compiler would be entitled to assume that the program will never receive any input which would cause the printf to be reached without q having received a value; since the only input character which would cause q to receive a value would be 'Z', a compiler could thus legitimately replace the code with:
int main(void)
{
getchar();
printf("You typed Z and the magic value is 5");
}
If the user types Z, the behavior of the original program will be well-defined, and the behavior of the latter will match it. If the user types anything else, the original program will invoke Undefined Behavior and, as a consequence, the Standard will impose no requirements on what the compiler may do. A compiler will be entitled to do anything it likes, including producing the same result as would be produced by typing Z.
I have been reading The Shellcoder's Handbook (2e) and have been trying to reproduce the stack overflow experiment on pages 18-23.
I have this code
void return_input (void)
{
char array[30];
gets (array);
printf(“%s\n”, array);
}
main()
{
return_input();
return 0;
}
Compile: gcc -fno-stack-protector -o overflow overflow.c
Dump of assembler code for function main:
0x080483ea <main+0>: push %ebp
0x080483eb <main+1>: mov %esp,%ebp
0x080483ed <main+3>: call 0x80483c4 <return_input>
0x080483f2 <main+8>: mov $0x0,%eax
0x080483f7 <main+13>: pop %ebp
0x080483f8 <main+14>: ret
We can overwrite the saved return address with the address of the call to return_input()
$ printf
"AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDD\xed\x83\x04\x08" | ./overflow
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDí
AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDò
So this causes our input to be printed twice. However, I wasn't prompted for input a second time. Shouldn't the second call to return_input() result in a second call to gets()?
This probably has to do with what gets() reads from stdin.
Slightly altered version of your program:
#include <stdio.h>
int n = 1;
void return_input(void)
{
char array[30];
gets (array);
printf("%s\n", array);
if (n--) return_input();
}
int main(void)
{
return_input();
return 0;
}
If I just run it, I can type in 2 short strings (each followed by the Enter key), like so:
C:\gets.exe
qwe
qwe
123
123
And here both qwe and 123 get repeated on the screen (first time when I type them, second, when they get printed).
When I pipe programs input on Windows with the echo command, I get the following without a chance to enter the second string, gets() somehow manages to obtain garbage as input when it's called second time:
C:\echo qwe|gets.exe
qwe
№ ☺
So, something is wrong in how gets() reads piped input and that has nothing to do with stack overflows.