Let's focus only on Rect_IsEmpty() function.
The nm command gives me this output:
(...)
00021af0 T Rect_IsEmpty
(...)
On the other hand, when I launch gdb and see the address of this function, I get:
(gdb) info address Rect_IsEmpty
Symbol "Rect_IsEmpty" is at 0x8057c84 in a file compiled without debugging.
Could anyone, please, explain why these addresses are not the same? Where does gdb get this address from?
nm gives you mangled name symbol table's address offset while gdb gives you actual virtual process's memory address which is changed every time you run the process. (Before run or start in GDB, it uses the same method as nm to get symbol addresses, using the same placeholder base address in PIE executables.)
nm is just a tool which shows you offset from the beginning of the code segment. In your case:
00021af0 T Rect_IsEmpty
simply means, that the symbol Rect_IsEmpty would have address 00021af0 if the executable were mapped at an image base of 0x1000, a dummy placeholder value that ld uses by default when linking a PIE. Normally the code segment is first with .text at the start of that, so the start of it will show an address of 0x1000 in nm or objdump -d.
When running a Position-Independent Executable on Linux, the ASLR mechanism is used for randomizing the base addresses of the whole thing, to something other than 0x1000. (Segments keep the same relative offset from each other, so PC-relative addressing can work, e.g. for x86-64 RIP-relative addressing of .data and .rodata from .text.)
GDB disables actual randomization, but the kernel still uses a high base address, not 0x1000. It will be the same one every time.
(If you built a traditional non-PIE executable, the kernel would have no choice where to load it, it would be the linker's choice, for example at 0x400000, which nm and objdump can see, as could GDB without starting the program. gcc -fno-pie -no-pie if you want that.)
When looking up the address of the function using debugger, you see the address of a symbol inside the process' code segment after having ASLR already done its job.
Here is a good article from IBM about shared libraries and another one about Procedure Linkage Table and Global Offset Table.
The executable will start at different memory locations, making any allocation within it different. Any function will therefore have different memory addresses from it's prior execution.
Regarding your question GDB gets the address from the debug information - it will show the absolute memory address.
Related
I'm trying to get the kernel entry function in the UEFI bootloader and im so confused.
Why does this code work?
int (*kmain)(void*) = (int(*)(void*)) (elf->entry);
this is what I link it with
gcc -no-pie -nostdlib -ffreestanding -e kmain -o kernel.elf kernel.o
I know it has something to do with -no-pie since without it wont work
elf->entry is a virtual address but since i am in the bootloader it references a physical address right?
How can the linker know what to set the entry to, without having access to ram?
What if elf->entry is 0x4000, then it goes into the physical address 0x4000 but WHAT if physical address 0x4000 is already in use by something else?
Without -no-pie I had to do it with base + elf->entry where base is the start of the elf file, and that I can totally understand, but I cant understand how just elf->entry can be OK
The linker, as a rule, doesn't really care where in memory it places anything. It's primary job is to make sure that all memory references are consistent, no matter how memory is laid out. The purpose of a linker script is to tell the linker how to lay out the memory. If you don't provide a linker script, it will use its own defaults. In other words, the linker doesn't know or care whether you have something already loaded at 0x4000. It's your job to know how your memory is laid out, and to provide a linker script if you want it laid out in a specific way.
As for the -no-pie bit of the question, that comes down to how position-independent and non-position-independent executables are loaded. Your UEFI bootloader is, among other things, a loader. There is a flag in the executable telling the loader whether or not it's a PIE. If it's not, then the loader just has to use the exact addresses encoded in the file. In this case, the elf->entry pointer will be exactly correct. If it is a PIE, then the loader can place it at whatever memory address it likes, in which case the elf->entry pointer will be relative to the address at which the executable is loaded. That's why you need to use base + elf->entry when you don't provide the -no-pie flag.
I have some example code here which I'm using to understand some C behaviour for a beginner's CTF:
// example.c
#include <stdio.h>
void main() {
void (*print)();
print = getenv("EGG");
print();
}
Compile: gcc -z execstack -g -m32 -o example example.c
Usage: EGG=$(echo -ne '\x90\xc3) ./example
If I compile the code with the execstack flag, the program will execute the opcodes I've injected above. Without the flag, the program will crash due to a segmentation fault.
Why exactly is this? Is it because getenv is storing the actual opcodes on the stack, and the execstack flag allows jumps to the stack? Or does getenv push a pointer onto the stack, and there are some other rules about what sections of memory are executable? I read the manpage, but I couldn't work out exactly what the rules are and how they're enforced.
Another issue is I think I'm also really lacking a good tool to visualise memory whilst debugging, so its hard to figure this out. Any advice would be really appreciated.
getenv doesn't store the env var's value on the stack. It's already on the stack from process startup, and getenv obtains a pointer to it.
See the i386 System V ABI's description of where argv[] and envp[] are located at process startup: above [esp].
_start doesn't copy them before calling main, just calculates pointers to them to pass as args to main. (Links to the latest version at https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI, where the official current version is maintained.)
Your code is casting a pointer to stack memory (containing the value of an env var) into a function pointer and calling through it. Look at the compiler-generated asm (e.g. on https://godbolt.org/): it'll be something like call getenv / call eax.
-zexecstack in your kernel version1 makes all your pages executable, not just the stack. It also applies to .data, .bss, and .rodata sections, and memory allocated with malloc / new.
The exact mechanism on GNU/Linux was a "read-implies-exec" process-wide flag that affects all future allocations, including manual use of mmap. See Unexpected exec permission from mmap when assembly files included in the project for more about the GNU_STACK ELF header stuff.
Footnote 1: Linux after 5.4 or so only makes the stack itself executable, not READ_IMPLIES_EXEC: Linux default behavior of executable .data section changed between 5.4 and 5.9?
Fun fact: taking the address of a nested function that accesses its parents local variables gets gcc to enable -zexecstack. It stores code for an executable "trampoline" onto the stack that passes a "static chain" pointer to the actual nested function, allowing it to reference its parent's stack-frame.
If you wanted to exec data as code without -zexecstack, you'd use mprotect(PROT_EXEC|PROT_READ|PROT_WRITE) on the page containing that env var. (It's part of your stack so you shouldn't remove write permission; it could be in the same page as main's stack frame for example.)
Related:
With GNU/Linux ld from binutils before late 2018 or so, the .rodata section is linked into the same ELF segment as the .text section, and thus const char code[] = {0xc3} or string literals are executable.
Current ld gives .rodata its own segment that's mapped read without exec, so finding ROP / Spectre "gadgets" in read-only data is no longer possible, unless you use -zexecstack. And even that doesn't work on current kernels; char code[] = ...; as a local inside a function will put data on the stack where it's actually executable. See How to get c code to execute hex machine code? for details.
I compiled a hello world C program and this is the file information :
hello: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/l, for GNU/Linux 3.2.0, BuildID[sha1]=3c4fc3bc82d53281357312935790846333a3c7bc, with debug_info, not stripped
When I check the segment header information, I see that the VirtAddr for the LOAD segment is pointing to the address 0x0000000000000000 which is defined as NULL. The entry address is 0x540 which indicates that it lies in the first LOAD segment of the two. The E (execute) flag and the .text section is also mapped to the first LOAD segment.
When I use gdb and set a breakpoint at main, then I see the address getting changed, which means the addresses have been shifted by certain offset. Why did this happen? I tried loading the program multiple times, but the offset remains constant, which means there is no address randomization happnening. I see other questions on SO which are getting load address just opposite of mine. Why? Same thing happen when I compile with -m32. Did something change w.r.t linux during past years that I get different output from the linked question?
You're seeing 0 for LOAD because your ELF is position independent.
Modern versions of GCC generate Position Independent Executables by default (unless configured otherwise). If the executable is PIE, the base virtual address in the ELF headers is set to 0. When you run your program under GDB, it temporarily disables address randomization and loads your program at the default address 0x0000555555554000.
If you want to compile a non-PIE executable you can use the -no-pie -fno-pie compilation flags.
I have some example code here which I'm using to understand some C behaviour for a beginner's CTF:
// example.c
#include <stdio.h>
void main() {
void (*print)();
print = getenv("EGG");
print();
}
Compile: gcc -z execstack -g -m32 -o example example.c
Usage: EGG=$(echo -ne '\x90\xc3) ./example
If I compile the code with the execstack flag, the program will execute the opcodes I've injected above. Without the flag, the program will crash due to a segmentation fault.
Why exactly is this? Is it because getenv is storing the actual opcodes on the stack, and the execstack flag allows jumps to the stack? Or does getenv push a pointer onto the stack, and there are some other rules about what sections of memory are executable? I read the manpage, but I couldn't work out exactly what the rules are and how they're enforced.
Another issue is I think I'm also really lacking a good tool to visualise memory whilst debugging, so its hard to figure this out. Any advice would be really appreciated.
getenv doesn't store the env var's value on the stack. It's already on the stack from process startup, and getenv obtains a pointer to it.
See the i386 System V ABI's description of where argv[] and envp[] are located at process startup: above [esp].
_start doesn't copy them before calling main, just calculates pointers to them to pass as args to main. (Links to the latest version at https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI, where the official current version is maintained.)
Your code is casting a pointer to stack memory (containing the value of an env var) into a function pointer and calling through it. Look at the compiler-generated asm (e.g. on https://godbolt.org/): it'll be something like call getenv / call eax.
-zexecstack in your kernel version1 makes all your pages executable, not just the stack. It also applies to .data, .bss, and .rodata sections, and memory allocated with malloc / new.
The exact mechanism on GNU/Linux was a "read-implies-exec" process-wide flag that affects all future allocations, including manual use of mmap. See Unexpected exec permission from mmap when assembly files included in the project for more about the GNU_STACK ELF header stuff.
Footnote 1: Linux after 5.4 or so only makes the stack itself executable, not READ_IMPLIES_EXEC: Linux default behavior of executable .data section changed between 5.4 and 5.9?
Fun fact: taking the address of a nested function that accesses its parents local variables gets gcc to enable -zexecstack. It stores code for an executable "trampoline" onto the stack that passes a "static chain" pointer to the actual nested function, allowing it to reference its parent's stack-frame.
If you wanted to exec data as code without -zexecstack, you'd use mprotect(PROT_EXEC|PROT_READ|PROT_WRITE) on the page containing that env var. (It's part of your stack so you shouldn't remove write permission; it could be in the same page as main's stack frame for example.)
Related:
With GNU/Linux ld from binutils before late 2018 or so, the .rodata section is linked into the same ELF segment as the .text section, and thus const char code[] = {0xc3} or string literals are executable.
Current ld gives .rodata its own segment that's mapped read without exec, so finding ROP / Spectre "gadgets" in read-only data is no longer possible, unless you use -zexecstack. And even that doesn't work on current kernels; char code[] = ...; as a local inside a function will put data on the stack where it's actually executable. See How to get c code to execute hex machine code? for details.
I need to find the entry point of an elf file running in a virtual environment (debian x86). On the host machine i can determine the current instruction pointer and all the other cpu registers. So it's possible to determine kernel function calls.
The function load_elf_binary calls the function start_thread with the following parameters:
start_thread(regs, elf_entry, bprm->p);
So I should be able to receive elf_entry from the cpu registers, but when I dump the registers, I get this output:
EAX=0xc61bffb4
EBX=0xc61bffb4
ECX=0xbff29430
EDX=0xb78ae850
ESI=0xc78f9500
EDI=0xbff29fec
EBP=0xbff29488
ESP=0xc61bfeb4
EIP=0xc1001f82
CR2=0xb78ca840
No register has a value starting with 0x8xxxxxxx, so I'm not sure if I made a mistake or the elf_entry isn't the entry point I'm looking for.
The essence question: Is the variable elf_entry in the kernel function load_elf_binary the entry point i get with readelf -h?
elf_entry on that function corresponds to the ELF entry point only for static binaries, for dynamic binaries it corresponds to the entry point of the dynamic linker (which should finally call the binary's entry point), see http://lxr.free-electrons.com/source/fs/binfmt_elf.c?v=2.6.32#L888.
After compiling a new kernel with printk I learned that new_ip is always starting with 0xb78xxxxxx, so elf_entry and the entry point of an elf file isn't the same.