Why cant i set Breakpoints using the memory addresses - c

I am learning Assembly Processor architecture and exploit development when I came across this tutorial for x68_64 bufferOverFlows, so i copied the vuln code and compiled it using gcc. My compiled binary does not let me set breakpoints but when i downloaded the binary from the website("I did not want to do this ") it worked fine and memory address were normal
But when i dump main in my compiled program with gdb my memory addresses look like this:
0x000000000000085e <+83>: lea -0xd0(%rbp),%rax
End of assembler dump.
When i try to set a Break poing after the scanf function:
(gdb) break *0x000000000000085e
Breakpoint 1 at 0x85e
(gdb) run
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void validate(char *pass) {
if (strcmp(pass, "[REDACTED]") == 0) {
printf("ACCESS GRANTED!");
printf("Oh that's just idio... Oh my god!\n");
} else {
printf("Damn it, I had something for this...\n");
}
}
int main(int argc, char **argv) {
char password[200];
printf("C:/ENTER PASSWORD: ");
scanf("%s", password);
validate(password);
return 0;
}

You can set breakpoints on virtual addresses, but objdump doesn't know where your PIE executable will be mapped into memory, so it uses 0 as a base address. To make things simpler, disable PIE (which your distro apparently enables by default). Presumably your tutorial was written before this was common. Use gcc -fno-pie -no-pie -g foo.c -o foo. Then addresses you see in objdump -drwC -Mintel will match run-time addresses.
But IDK why you want numeric addresses; use b main and single-step from there. Even if you leave out -g, you'll still have symbol names for functions.
To solve the question as asked, see Stopping at the first machine code instruction in GDB and Set a breakpoint on GDB entry point for stripped PIE binaries without disabling ASLR.
Once you have a running process from your executable, you can p &main or disas main to find the actual runtime address of main. But note that gdb disables ASLR, so if you use code addresses you find with GDB in your exploit against a PIE executable, they will only work when run under GDB. Running it "normally" will randomize the virtual address where your executable is mapped. (This is why I suggested building a position-dependent executable). But more likely you just want to return to executable code on an executable stack, in which case it's stack ASLR that matters, and stack-ASLR still happens in plain old position-dependent executables (unless you disable it too, like gdb does).

Related

The libc function "system" does not get linked in the executable produced by GCC even after using the static flag

I am new to systems programming. I was just trying to implement the ret2libc attack on my own. To implement that, I need the address of the start of the libc function "system" in the executable.
I tried to do static linking, but to my surprise, the "system" function is not getting linked in my executable.
Below is a simple program saved as t.c:
#include <stdio.h>
#include <stdlib.h>
int fun()
{
int x;
}int main()
{
fun();
int x;
x=10;
printf("%d\n",x);
return 0;
}
Below is the output which I get when using GDB:
abhishek#abhishek:~$ gcc -g -static t.c -o t
abhishek#abhishek:~$ gdb -q t
Reading symbols from t...
(gdb) print system
No symbol "system" in current context.
(gdb) print memcpy
$1 = {<text gnu-indirect-function variable, no debug info>} 0x422b40 <memcpy>
(gdb) print strcmp
$2 = {<text gnu-indirect-function variable, no debug info>} 0x421b50 <strcmp_ifunc>
(gdb) print printf
$3 = {<text variable, no debug info>} 0x40b5b0 <printf>
(gdb) quit
abhishek#abhishek:~$
libc functions like memcpy, strcmp and even printf is getting linked. But not the function system. The tutorials out on the internet just asks to get the address of system and then proceed accordingly. But I am unable to get the address of system in the first place.
Could anyone help me why the function system is not linked even when I am using the -static flag in GCC?
If an executable is linked against a library, and the library is built correctly*, only called functions of the library are included in the final executable.
Since your program does not call system(), and no other function calls it, it is apparently not included.
The solution is to call system(), for example in an unused control branch.
*) A library commonly contains modules, which are compiled from translation units. Such a translation unit is commonly a source file. For example, if your libc were built with a module that includes both printf() and system(), the latter function would be in the executable, even if it only calls printf().
Common linkers only include modules that resolve references that are unresolved at that step.

Buffer overflow, pointing to the proper address but still not working [duplicate]

I have some example code here which I'm using to understand some C behaviour for a beginner's CTF:
// example.c
#include <stdio.h>
void main() {
void (*print)();
print = getenv("EGG");
print();
}
Compile: gcc -z execstack -g -m32 -o example example.c
Usage: EGG=$(echo -ne '\x90\xc3) ./example
If I compile the code with the execstack flag, the program will execute the opcodes I've injected above. Without the flag, the program will crash due to a segmentation fault.
Why exactly is this? Is it because getenv is storing the actual opcodes on the stack, and the execstack flag allows jumps to the stack? Or does getenv push a pointer onto the stack, and there are some other rules about what sections of memory are executable? I read the manpage, but I couldn't work out exactly what the rules are and how they're enforced.
Another issue is I think I'm also really lacking a good tool to visualise memory whilst debugging, so its hard to figure this out. Any advice would be really appreciated.
getenv doesn't store the env var's value on the stack. It's already on the stack from process startup, and getenv obtains a pointer to it.
See the i386 System V ABI's description of where argv[] and envp[] are located at process startup: above [esp].
_start doesn't copy them before calling main, just calculates pointers to them to pass as args to main. (Links to the latest version at https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI, where the official current version is maintained.)
Your code is casting a pointer to stack memory (containing the value of an env var) into a function pointer and calling through it. Look at the compiler-generated asm (e.g. on https://godbolt.org/): it'll be something like call getenv / call eax.
-zexecstack in your kernel version1 makes all your pages executable, not just the stack. It also applies to .data, .bss, and .rodata sections, and memory allocated with malloc / new.
The exact mechanism on GNU/Linux was a "read-implies-exec" process-wide flag that affects all future allocations, including manual use of mmap. See Unexpected exec permission from mmap when assembly files included in the project for more about the GNU_STACK ELF header stuff.
Footnote 1: Linux after 5.4 or so only makes the stack itself executable, not READ_IMPLIES_EXEC: Linux default behavior of executable .data section changed between 5.4 and 5.9?
Fun fact: taking the address of a nested function that accesses its parents local variables gets gcc to enable -zexecstack. It stores code for an executable "trampoline" onto the stack that passes a "static chain" pointer to the actual nested function, allowing it to reference its parent's stack-frame.
If you wanted to exec data as code without -zexecstack, you'd use mprotect(PROT_EXEC|PROT_READ|PROT_WRITE) on the page containing that env var. (It's part of your stack so you shouldn't remove write permission; it could be in the same page as main's stack frame for example.)
Related:
With GNU/Linux ld from binutils before late 2018 or so, the .rodata section is linked into the same ELF segment as the .text section, and thus const char code[] = {0xc3} or string literals are executable.
Current ld gives .rodata its own segment that's mapped read without exec, so finding ROP / Spectre "gadgets" in read-only data is no longer possible, unless you use -zexecstack. And even that doesn't work on current kernels; char code[] = ...; as a local inside a function will put data on the stack where it's actually executable. See How to get c code to execute hex machine code? for details.

Exactly what cases does the gcc execstack flag allow and how does it enforce it?

I have some example code here which I'm using to understand some C behaviour for a beginner's CTF:
// example.c
#include <stdio.h>
void main() {
void (*print)();
print = getenv("EGG");
print();
}
Compile: gcc -z execstack -g -m32 -o example example.c
Usage: EGG=$(echo -ne '\x90\xc3) ./example
If I compile the code with the execstack flag, the program will execute the opcodes I've injected above. Without the flag, the program will crash due to a segmentation fault.
Why exactly is this? Is it because getenv is storing the actual opcodes on the stack, and the execstack flag allows jumps to the stack? Or does getenv push a pointer onto the stack, and there are some other rules about what sections of memory are executable? I read the manpage, but I couldn't work out exactly what the rules are and how they're enforced.
Another issue is I think I'm also really lacking a good tool to visualise memory whilst debugging, so its hard to figure this out. Any advice would be really appreciated.
getenv doesn't store the env var's value on the stack. It's already on the stack from process startup, and getenv obtains a pointer to it.
See the i386 System V ABI's description of where argv[] and envp[] are located at process startup: above [esp].
_start doesn't copy them before calling main, just calculates pointers to them to pass as args to main. (Links to the latest version at https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI, where the official current version is maintained.)
Your code is casting a pointer to stack memory (containing the value of an env var) into a function pointer and calling through it. Look at the compiler-generated asm (e.g. on https://godbolt.org/): it'll be something like call getenv / call eax.
-zexecstack in your kernel version1 makes all your pages executable, not just the stack. It also applies to .data, .bss, and .rodata sections, and memory allocated with malloc / new.
The exact mechanism on GNU/Linux was a "read-implies-exec" process-wide flag that affects all future allocations, including manual use of mmap. See Unexpected exec permission from mmap when assembly files included in the project for more about the GNU_STACK ELF header stuff.
Footnote 1: Linux after 5.4 or so only makes the stack itself executable, not READ_IMPLIES_EXEC: Linux default behavior of executable .data section changed between 5.4 and 5.9?
Fun fact: taking the address of a nested function that accesses its parents local variables gets gcc to enable -zexecstack. It stores code for an executable "trampoline" onto the stack that passes a "static chain" pointer to the actual nested function, allowing it to reference its parent's stack-frame.
If you wanted to exec data as code without -zexecstack, you'd use mprotect(PROT_EXEC|PROT_READ|PROT_WRITE) on the page containing that env var. (It's part of your stack so you shouldn't remove write permission; it could be in the same page as main's stack frame for example.)
Related:
With GNU/Linux ld from binutils before late 2018 or so, the .rodata section is linked into the same ELF segment as the .text section, and thus const char code[] = {0xc3} or string literals are executable.
Current ld gives .rodata its own segment that's mapped read without exec, so finding ROP / Spectre "gadgets" in read-only data is no longer possible, unless you use -zexecstack. And even that doesn't work on current kernels; char code[] = ...; as a local inside a function will put data on the stack where it's actually executable. See How to get c code to execute hex machine code? for details.

Why would a simple C program need syscalls?

Related to this other question. I am trying to run this simple C program in gem5:
int main() {
int a=1, b=2;
int c=a+b;
return c;
}
And it fails because gem5 doesn't have some syscalls implemented.
My question is, why would a simple program like this require syscalls? This should run bare-metal without trouble. Is there a way to compile this to avoid syscalls? I am using arm-linux-gnueabi-gcc -static -DUNIX to compile it.
Without syscalls the program cannot exit. The way it works is typically something like this:
// Not how it's actually implemented... just a sketch.
void _start() {
char **argv = ...;
int argc = ...;
// ... other initialization code ...
int retcode = main(argc, argv);
exit(retcode);
}
The exact details depend on the operating system, but exit(), which terminates the process, typically has to be a system call or is implemented with system calls.
Note that this is true for "hosted" C implementations, not for "freestanding" C implementations, and is highly operating-system specific. There are freestanding C implementations can run on bare metal, but hosted C implementations usually need an operating system.
You can compile without standard libraries and without the runtime but your entry point cannot return... there is nothing to return to, without a runtime.
Creating a baremetal program
It is generally possible to compile programs capable of running baremetal.
Use -ffreestanding. This makes GCC generate code that does not assume that the standard library is available (and has other effects).
Use -nostdlib. This will prevent GCC from linking with the standard library. Note that memcmp, memset, memcpy, and memmove calls may be generated anyway, so you may have to provide these yourself.
At this point you can write your program, but you typically have to use _start instead of main:
void _start(void) {
while (1) { }
}
Note that you can't return from _start! Think about it... there is nowhere to return to. When you compile a program like this you can see that it doesn't use any system calls and doesn't have a loader.
$ gcc -ffreestanding -nostdlib test.c
We can verify that it loads no libraries:
$ ldd a.out
statically linked
$ readelf -d a.out
Dynamic section at offset 0xf30 contains 8 entries:
Tag Type Name/Value
0x000000006ffffef5 (GNU_HASH) 0x278
0x0000000000000005 (STRTAB) 0x2b0
0x0000000000000006 (SYMTAB) 0x298
0x000000000000000a (STRSZ) 1 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000015 (DEBUG) 0x0
0x000000006ffffffb (FLAGS_1) Flags: PIE
0x0000000000000000 (NULL) 0x0
We can also see that it doesn't contain any code that makes system calls:
$ objdump -d a.out
a.out: file format elf64-x86-64
Disassembly of section .text:
00000000000002c0 <_start>:
2c0: eb fe jmp 2c0 <_start>
My question is, why would a simple program like this require syscalls?
The run-time loader ld.so does syscalls. The C run-time does syscalls. Do strace <application> and see.
There are some parameters to gcc you might want to checkout. Among others:
-ffreestanding
-nostdlib
-nodefaultlibs
My question is, why would a simple program like this require syscalls?
Because entering main and exiting the program is based on syscalls.
Compiling with arm-unknown-linux-uclibcgnueabi solved the issue. Apparently uclibc implementation doesn't use the syscalls that gem5 doesn't have implemented.

Minimum 504Kb Memory Usage

On doing some experiments whilst learning C, I've come across something odd. This is my program:
int main(void) {sleep(5);}
When it is compiled, the file size of the executable is 8496 bytes (in comparison to the 26 byte source!) This is understandable as sleep is called and the instructions for calling that are written in the executable. Another point to make is that without the sleep, the executable becomes 4312 bytes.
int main(void) {}
My main question is what happens when the first program is run. I'm using clang to compile and Mac OS X to run it. The result (according to Activity Monitor) is that the program uses 504KB of "real memory". Why is it so big when the program is just 4KB? I am assuming that the executable is loaded into memory but I haven't done anything apart from a sleep call. Why does my program need 500KB to sleep for five seconds?
By the way, the reason I'm using sleep is to be able to catch the amount of memory being used using Activity Monitor in the first place.
I ask simply out of curiosity, cheers!
When you compile a C program it is linked into an executable. Even though your program is very small it will link to the C runtime which will include some additional code. There may be some error handling and this error handling may write to the console and this code may include sprintf which adds some footprint to your application. You can request the linker to produce a map of the code in your executable to see what is actually included.
Also, an executable file contains more than machine code. There will be various tables for data and dynamic linking which will increase the size of the executable and there may also be some wasted space because the various parts are stored in blocks.
The C runtime will initialize before main is called and this will result in both some code being loaded (e.g. by dynamically linking to various operating system features) as well as memory being allocated for a a heap, a stack for each thread and probably also some static data. Not all of this data may show as "real memory" - the default stack size on OS X appears to be 8 MB and your application is still using much less than this.
In this case I suppose the size difference you've observed is significantly caused by dynamic linking.
Linkers usually don't place common code into the executable binaries, instead they reserve the information and the code would be loaded when the binary is loaded. Here those common code is stored in files called shared object(SO) or dynamically linked library(DLL).
[pengyu#GLaDOS temp]$ cat test.c
int main(void) {sleep(5);}
[pengyu#GLaDOS temp]$ gcc test.c
[pengyu#GLaDOS temp]$ du -h --apparent-size a.out
6.6K a.out
[pengyu#GLaDOS temp]$ gcc test.c -static
[pengyu#GLaDOS temp]$ du -h --apparent-size a.out
807K a.out
ALso, here I'm listing what are there in the memory of a process:
There're necessary dynamic libraries to be loaded in:
Here ldd gives the result of dynamic libraries to be loaded when invoking the binary. These libraries locates in the part where it's obtained by calling the mmap system call.
[pengyu#GLaDOS temp]$ cat test.c
int main(void) {sleep(5);}
[pengyu#GLaDOS temp]$ gcc test.c
[pengyu#GLaDOS temp]$ ldd ./a.out
linux-vdso.so.1 (0x00007fff576df000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f547a212000)
/lib64/ld-linux-x86-64.so.2 (0x00007f547a5bd000)
There're sections like .data, .code to be allocated for data from your binary file.
This part exists in the binary executable, so the size is supposed to be no lager than the file itself. Contents copied at the loading stage of a executable binary.
There're sections like .bss and also the stack zone to be allocated for dynamically use during execution of the program.
This part does not exist in the binary executable, so the size could be quite large without being affected by size of the file itself.

Resources