First an abstraction of my program:
int main ()
{
My_Struct1 ms1; // sizeof (My_Struct1) is 88712 B -- L1
My_Struct2 ms2; // sizeof (My_Struct2) is 13208 B -- L2
// 1. Invoke parser to fill in the two struct instances. -- L3
printf ("%ul, %ul\n", &ms1, &ms2) // -- **L3b** doesn't produce seg. fault.
my_fun (&ms1, &ms2); // -- L4, does produce seg. fault.
return 0;
}
If I run my program using makefile, then a segmentation fault occurs at L4 (always).
If I execute my program directly from shell (./executable), then the segmentation does occur sometimes but not always.
The error is: Segmentation fault: Cannot access memory at address at L4 for &ms1 and &ms2 both. The type and location of the error is what was pointed out by gdb.
My guess is that the error is because of the size of the structures.
Please explain in detail what is going.
The error behivour is same even after reducing the size of My_Struct1 to 8112 B and My_Struct2 to 1208 B.
I am working on:
Ubuntu 14.04
Intel® Core™ i5-4200M CPU # 2.50GHz × 4
3.8 GiB memory
gcc - 4.8.4
First, compile with all warnings & debug info. Probably with CFLAGS= -g -Wall -Wextra in your Makefile. Perhaps you might sometimes add some sanitize instrumentation options such as -fsanitize=address or -fsanitize=undefined (then it could be worthwhile to upgrade your GCC compiler to GCC 5 in march 2016).You might also want -Wstack-usage=1000 warning & -fstack-usage developer option.
Be very afraid of undefined behavior.
Then, enable core(5) dumps. Probably some ulimit -c 100000 (or whatever number is realistic) in your ~/.bashrc then start a new terminal; check with cat /proc/self/limits (a Linux specific command, related to proc(5)) that the limits are well set. See setrlimit(2).
Run your faulty test, e.g. with make test. You'll get a core dump. Check with ls -l core and file core.
At last, do a post mortem debugging session. If your binary is someprog, run gdb someprog core. Probably the first gdb command you'll type would be bt
Indeed, you are probably wrong in declaring quite large struct as local variables in main. The rule of thumb is to restrict your call frame to a few kilobytes at most (hence, never have a local variable of more than a kilobyte in the call stack). So I would recommend putting your large struct in the heap (so use malloc and free appropriately, read about C dynamic memory allocation). But a typical call stack on Linux can grow to several megabytes.
Also, run your program with valgrind
BTW, the correct format for (void*) pointers in %p so your added printf should be
printf("ms1#%p, ms2#%p\n", (void*)&ms1, (void*)&ms2);
Related
I have some example code here which I'm using to understand some C behaviour for a beginner's CTF:
// example.c
#include <stdio.h>
void main() {
void (*print)();
print = getenv("EGG");
print();
}
Compile: gcc -z execstack -g -m32 -o example example.c
Usage: EGG=$(echo -ne '\x90\xc3) ./example
If I compile the code with the execstack flag, the program will execute the opcodes I've injected above. Without the flag, the program will crash due to a segmentation fault.
Why exactly is this? Is it because getenv is storing the actual opcodes on the stack, and the execstack flag allows jumps to the stack? Or does getenv push a pointer onto the stack, and there are some other rules about what sections of memory are executable? I read the manpage, but I couldn't work out exactly what the rules are and how they're enforced.
Another issue is I think I'm also really lacking a good tool to visualise memory whilst debugging, so its hard to figure this out. Any advice would be really appreciated.
getenv doesn't store the env var's value on the stack. It's already on the stack from process startup, and getenv obtains a pointer to it.
See the i386 System V ABI's description of where argv[] and envp[] are located at process startup: above [esp].
_start doesn't copy them before calling main, just calculates pointers to them to pass as args to main. (Links to the latest version at https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI, where the official current version is maintained.)
Your code is casting a pointer to stack memory (containing the value of an env var) into a function pointer and calling through it. Look at the compiler-generated asm (e.g. on https://godbolt.org/): it'll be something like call getenv / call eax.
-zexecstack in your kernel version1 makes all your pages executable, not just the stack. It also applies to .data, .bss, and .rodata sections, and memory allocated with malloc / new.
The exact mechanism on GNU/Linux was a "read-implies-exec" process-wide flag that affects all future allocations, including manual use of mmap. See Unexpected exec permission from mmap when assembly files included in the project for more about the GNU_STACK ELF header stuff.
Footnote 1: Linux after 5.4 or so only makes the stack itself executable, not READ_IMPLIES_EXEC: Linux default behavior of executable .data section changed between 5.4 and 5.9?
Fun fact: taking the address of a nested function that accesses its parents local variables gets gcc to enable -zexecstack. It stores code for an executable "trampoline" onto the stack that passes a "static chain" pointer to the actual nested function, allowing it to reference its parent's stack-frame.
If you wanted to exec data as code without -zexecstack, you'd use mprotect(PROT_EXEC|PROT_READ|PROT_WRITE) on the page containing that env var. (It's part of your stack so you shouldn't remove write permission; it could be in the same page as main's stack frame for example.)
Related:
With GNU/Linux ld from binutils before late 2018 or so, the .rodata section is linked into the same ELF segment as the .text section, and thus const char code[] = {0xc3} or string literals are executable.
Current ld gives .rodata its own segment that's mapped read without exec, so finding ROP / Spectre "gadgets" in read-only data is no longer possible, unless you use -zexecstack. And even that doesn't work on current kernels; char code[] = ...; as a local inside a function will put data on the stack where it's actually executable. See How to get c code to execute hex machine code? for details.
When I use the valgrind bbv tool, I encountered a problem. The instruction count of the same executable file valgrind is quite different from that of pmu.
Such as SPEC2006 omnetpp, the instruction count is about 57190 billion by valgrind, but that is 57290 billion by pmu. The environment in which the program runs is the same.
Then we wrote a simple program to verify this, the result is that valgrind's instruction count is about 800 less.
The test code:
#include <unistd.h>
int main(void)
{
unsigned int i = 0, sum = 0;
sum += i;
return 0;
}
valgrind --tool=exp-bbv ./withmain
Total instructions: 5232
simpleperf stat -e instructions:u ./withmain (just count instructions in usespace)
Performance counter statistics:
6,043 instructions:u # (100%)
2. Then we found the functions like _start and _init is UNKNOWN in valgrind, and the symbol size of that functions is zero. Valgrind seems to ignore them during command analysis.
I found that there are such descriptions of this kind of glibc function in m_main.c:
If linking of the final executables is done with glibc present, then Valgrind starts at main() above as usual, and all of the following code is irrelevant.
However, this is not the intended mode of use. The plan is to avoid linking against glibc, by giving gcc the flags -nodefaultlibs -lgcc -nostartfiles at startup.
Q:
1) Anyone known why can't valgrind analyze these functions?
2) The descriptions above mean that we have to avoid using the glibc function? Is there any other way to resolve this difference of instruction count?
3) Why size of some function symbol(such as _start, _init, etc) is zero in elf file? but when I objdump that file, I can see the whole code of that functions.
3. The program environment:
Operating environment: Andriod10 (aarch64 Linux 4.14)
CPU core: armv8 CotexA55
Cross-Compiler: gcc-linaro-7.5.0-2019.12-x86_64_aarch64-linux-gnu (both valgrind and test code compile by this cross tool, --static, -g -O3 and other para already tried). I have tried gcc-4.8.2 that have same problem.
I have some example code here which I'm using to understand some C behaviour for a beginner's CTF:
// example.c
#include <stdio.h>
void main() {
void (*print)();
print = getenv("EGG");
print();
}
Compile: gcc -z execstack -g -m32 -o example example.c
Usage: EGG=$(echo -ne '\x90\xc3) ./example
If I compile the code with the execstack flag, the program will execute the opcodes I've injected above. Without the flag, the program will crash due to a segmentation fault.
Why exactly is this? Is it because getenv is storing the actual opcodes on the stack, and the execstack flag allows jumps to the stack? Or does getenv push a pointer onto the stack, and there are some other rules about what sections of memory are executable? I read the manpage, but I couldn't work out exactly what the rules are and how they're enforced.
Another issue is I think I'm also really lacking a good tool to visualise memory whilst debugging, so its hard to figure this out. Any advice would be really appreciated.
getenv doesn't store the env var's value on the stack. It's already on the stack from process startup, and getenv obtains a pointer to it.
See the i386 System V ABI's description of where argv[] and envp[] are located at process startup: above [esp].
_start doesn't copy them before calling main, just calculates pointers to them to pass as args to main. (Links to the latest version at https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI, where the official current version is maintained.)
Your code is casting a pointer to stack memory (containing the value of an env var) into a function pointer and calling through it. Look at the compiler-generated asm (e.g. on https://godbolt.org/): it'll be something like call getenv / call eax.
-zexecstack in your kernel version1 makes all your pages executable, not just the stack. It also applies to .data, .bss, and .rodata sections, and memory allocated with malloc / new.
The exact mechanism on GNU/Linux was a "read-implies-exec" process-wide flag that affects all future allocations, including manual use of mmap. See Unexpected exec permission from mmap when assembly files included in the project for more about the GNU_STACK ELF header stuff.
Footnote 1: Linux after 5.4 or so only makes the stack itself executable, not READ_IMPLIES_EXEC: Linux default behavior of executable .data section changed between 5.4 and 5.9?
Fun fact: taking the address of a nested function that accesses its parents local variables gets gcc to enable -zexecstack. It stores code for an executable "trampoline" onto the stack that passes a "static chain" pointer to the actual nested function, allowing it to reference its parent's stack-frame.
If you wanted to exec data as code without -zexecstack, you'd use mprotect(PROT_EXEC|PROT_READ|PROT_WRITE) on the page containing that env var. (It's part of your stack so you shouldn't remove write permission; it could be in the same page as main's stack frame for example.)
Related:
With GNU/Linux ld from binutils before late 2018 or so, the .rodata section is linked into the same ELF segment as the .text section, and thus const char code[] = {0xc3} or string literals are executable.
Current ld gives .rodata its own segment that's mapped read without exec, so finding ROP / Spectre "gadgets" in read-only data is no longer possible, unless you use -zexecstack. And even that doesn't work on current kernels; char code[] = ...; as a local inside a function will put data on the stack where it's actually executable. See How to get c code to execute hex machine code? for details.
On doing some experiments whilst learning C, I've come across something odd. This is my program:
int main(void) {sleep(5);}
When it is compiled, the file size of the executable is 8496 bytes (in comparison to the 26 byte source!) This is understandable as sleep is called and the instructions for calling that are written in the executable. Another point to make is that without the sleep, the executable becomes 4312 bytes.
int main(void) {}
My main question is what happens when the first program is run. I'm using clang to compile and Mac OS X to run it. The result (according to Activity Monitor) is that the program uses 504KB of "real memory". Why is it so big when the program is just 4KB? I am assuming that the executable is loaded into memory but I haven't done anything apart from a sleep call. Why does my program need 500KB to sleep for five seconds?
By the way, the reason I'm using sleep is to be able to catch the amount of memory being used using Activity Monitor in the first place.
I ask simply out of curiosity, cheers!
When you compile a C program it is linked into an executable. Even though your program is very small it will link to the C runtime which will include some additional code. There may be some error handling and this error handling may write to the console and this code may include sprintf which adds some footprint to your application. You can request the linker to produce a map of the code in your executable to see what is actually included.
Also, an executable file contains more than machine code. There will be various tables for data and dynamic linking which will increase the size of the executable and there may also be some wasted space because the various parts are stored in blocks.
The C runtime will initialize before main is called and this will result in both some code being loaded (e.g. by dynamically linking to various operating system features) as well as memory being allocated for a a heap, a stack for each thread and probably also some static data. Not all of this data may show as "real memory" - the default stack size on OS X appears to be 8 MB and your application is still using much less than this.
In this case I suppose the size difference you've observed is significantly caused by dynamic linking.
Linkers usually don't place common code into the executable binaries, instead they reserve the information and the code would be loaded when the binary is loaded. Here those common code is stored in files called shared object(SO) or dynamically linked library(DLL).
[pengyu#GLaDOS temp]$ cat test.c
int main(void) {sleep(5);}
[pengyu#GLaDOS temp]$ gcc test.c
[pengyu#GLaDOS temp]$ du -h --apparent-size a.out
6.6K a.out
[pengyu#GLaDOS temp]$ gcc test.c -static
[pengyu#GLaDOS temp]$ du -h --apparent-size a.out
807K a.out
ALso, here I'm listing what are there in the memory of a process:
There're necessary dynamic libraries to be loaded in:
Here ldd gives the result of dynamic libraries to be loaded when invoking the binary. These libraries locates in the part where it's obtained by calling the mmap system call.
[pengyu#GLaDOS temp]$ cat test.c
int main(void) {sleep(5);}
[pengyu#GLaDOS temp]$ gcc test.c
[pengyu#GLaDOS temp]$ ldd ./a.out
linux-vdso.so.1 (0x00007fff576df000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f547a212000)
/lib64/ld-linux-x86-64.so.2 (0x00007f547a5bd000)
There're sections like .data, .code to be allocated for data from your binary file.
This part exists in the binary executable, so the size is supposed to be no lager than the file itself. Contents copied at the loading stage of a executable binary.
There're sections like .bss and also the stack zone to be allocated for dynamically use during execution of the program.
This part does not exist in the binary executable, so the size could be quite large without being affected by size of the file itself.
Plus, The program runs on a arm device running Linux, I can print out stack info and register values in the sig-seg handler I assign.
The problem is I can't add -g option to the source file, since the bug may won't reproduce due to performance downgrade.
Compiling with the -g option to gcc does not cause a "performance downgrade". All it does is cause debugging symbols to be included; it does not affect the optimisation or code generation.
If you install your SIGSEGV handler using the sa_sigaction member of the sigaction struct passed to sigaction(), then the si_addr member of the siginfo_t structure passed to your handler contains the faulting address.
I tend to use valgrind which indicates leaks and memory access faults.
This seems to work
http://tlug.up.ac.za/wiki/index.php/Obtaining_a_stack_trace_in_C_upon_SIGSEGV
static void signal_segv(int signum, siginfo_t* info, void*ptr) {
// info->si_addr is the illegal address
}
If you are worried about using -g on the binary that you load on the device, you may be able to use gdbserver on the ARM device with a stripped version of the executable and run arm-gdb on your development machine with the unstripped version of the executable. The stripped version and the unstripped version need to match up to do this, so do this:
# You may add your own optimization flags
arm-gcc -g program.c -o program.debug
arm-strip --strip-debug program.debug -o program
# or
arm-strip --strip-unneeded program.debug -o program
You'll need to read the gdb and gdbserver documentation to figure out how to use them. It's not that difficult, but it isn't as polished as it could be. Mainly it's very easy to accidentally tell gdb to do something that it ends up thinking you meant to do locally, so it will switch out of remote debugging mode.
You may also want to use the backtrace() function if available, that will provide the call stack at the time of the crash. This can be used in order to dump the stack like it happens in an high level programming language when a C program gets a segmentation fault, bus error, or other memory violation error.
backtrace() is available both on Linux and Mac OS X
If the -g option makes the error disappear, then knowing where it crashes is unlikely to be useful anyway. It's probably writing to an uninitialized pointer in function A, and then function B tries to legitimately use that memory, and dies. Memory errors are a pain.