Stripped binary shows "_cxa_finalize" instead of "libc_start_main" - c

Why stripped binary shows _cxa_finalize instead of libc_start_main?
I am trying to locate and disassemble main() in a very simple C program on Linux (Ubuntu). The binary is stripped. Below you can see disassembly (not stripped) vs disassembly (stripped) of the same instructions.
Question: what is _cxa_finalize in the stripped version? Why is libc_start_main is replaced by _cxa_finalize?
Not stripped:
106d: 48 8d 3d c1 00 00 00 lea rdi,[rip+0xc1] # 1135 <main>
1074: ff 15 66 2f 00 00 call QWORD PTR [rip+0x2f66] # 3fe0 <__libc_start_main#GLIBC_2.2.5>
Stripped:
106d: 48 8d 3d c1 00 00 00 lea rdi,[rip+0xc1] # 1135 <__cxa_finalize#plt+0xf5>
1074: ff 15 66 2f 00 00 call QWORD PTR [rip+0x2f66] # 3fe0 <__cxa_finalize#plt+0x2fa0>

It's not __cxa_finalize. It's __cxa_finalize#plt+0xf5 and __cxa_finalize#plt+0x2fa0 (notice the significant offsets). The disassembler has no information about the symbol main or __libc_start_main because you removed the symbol table, but for technical reasons it is still aware of the symbols assocated with PLT thunks (because they're needed for binding at dynamic linking time, and the disassembler probably falls back to using that information when it lacks s symbol table). In general, the disassembler works backward from an address until it finds an address named by a symbol, and assumes (wrongly, here) that the address being disassembled is part of that function.

Related

In shared objects, why does gcc relocate through the GOT for global variables which are defined in the same shared object?

While answering another question on Stack, I came upon a question myself. With the following code in shared.c:
#include "stdio.h"
int glob_var = 0;
void shared_func(){
printf("Hello World!");
glob_var = 3;
}
Compiled with:
gcc -shared -fPIC shared.c -oshared.so
If I disassemble the code with objdump -D shared.so, I get the following for shared_func:
0000000000001119 <shared_func>:
1119: f3 0f 1e fa endbr64
111d: 55 push %rbp
111e: 48 89 e5 mov %rsp,%rbp
1121: 48 8d 3d d8 0e 00 00 lea 0xed8(%rip),%rdi # 2000 <_fini+0xebc>
1128: b8 00 00 00 00 mov $0x0,%eax
112d: e8 1e ff ff ff callq 1050 <printf#plt>
1132: 48 8b 05 9f 2e 00 00 mov 0x2e9f(%rip),%rax # 3fd8 <glob_var##Base-0x54>
1139: c7 00 03 00 00 00 movl $0x3,(%rax)
113f: 90 nop
1140: 5d pop %rbp
1141: c3 retq
Using readelf -S shared.so, I get (for the GOT):
[22] .got PROGBITS 0000000000003fd8 00002fd8
0000000000000028 0000000000000008 WA 0 0 8
Correct me if I'm wrong but, looking at this, the relocation for accessing glob_var seems to be through the GOT. As I read on some websites, this is due to limitations in x86-64 machine code where RIP-relative addressing is limited to a 32 bits offset. This explanation is not satisfactory to me because, since the global variable is in the same shared object, then it is guaranteed to be found in its own data segment. The global variable could thus be accessed using RIP-relative addressing without an issue.
I would understand the GOT relocation if glob_var had been declared extern but, in this case, it is defined in the same shared object. Why does gcc require a relocation through the GOT? Is it because it is not able to detect that the global variable is defined within the same shared object?
Related: Why are nonstatic global variables defined in shared objects referenced using GOT?
The above is 11 years old and doesn't answer my question because there doesn't seem to be an appropriate answer there.
Bonus: what does <glob_var##Base-0x54> mean in the disassembly of shared_func? Why it isn't <glob_var#GOT>?
Thanks for any help!

Minimal 64-bit Windows executable crashes with tail-call optimization enabled by gcc

I'm trying to create a minimal 64-bit Windows executable to better understand how the Windows executable format works.
I wrote very basic assembly and C code as follows.
hi.s
section .text
hi:
db "hi", 0
global sayHi
align 16
sayHi:
lea rax, [rel hi]
ret
start.c
extern int puts();
extern const char *sayHi();
void start() {
puts(sayHi());
}
compiled with,
nasm -fwin64 hi.s
gcc -c -ostart.obj -O3 -fno-optimize-sibling-calls start.c
# I will explain the flag
and linked with,
golink /fo r.exe /console start.obj hi.obj msvcrt.dll
# create a console application `r.exe`
# the default entry point is `start`
The program runs fine and prints hi, but note the gcc flag -fno-optimize-sibling-calls. That flag disables tail-call optimizations so that the program always allocates stack space and calls a function. Without the flag, the program crashes.
This is the disassembled result without tail-call optimization. Not sure why gcc put a nop there, but otherwise it's very simple and runs fine.
0000000000401000 <.text>:
401000: 48 83 ec 28 sub rsp,0x28
401004: e8 27 00 00 00 call 0x401030 # sayHi
401009: 48 89 c1 mov rcx,rax
40100c: e8 ff 2f 00 00 call 0x404010 # puts
401011: 90 nop
401012: 48 83 c4 28 add rsp,0x28
401016: c3 ret
...
401020: 68 69 00 90 90 push 0xffffffff90900069 # "hi"
...
401030: 48 8d 05 e9 ff ff ff lea rax,[rip+0xffffffffffffffe9] # 0x401020
401037: c3 ret
This is when tail-call opt is enabled, in which the program crashes.
0000000000401000 <.text>:
401000: 48 83 ec 28 sub rsp,0x28
401004: e8 27 00 00 00 call 0x401030 # sayHi
401009: 48 89 c1 mov rcx,rax
40100c: 48 83 c4 28 add rsp,0x28
401010: e9 eb 2f 00 00 jmp 0x404000 # puts
...
401020: 68 69 00 90 90 push 0xffffffff90900069 # "hi"
...
401030: 48 8d 05 e9 ff ff ff lea rax,[rip+0xffffffffffffffe9] # 0x401020
401037: c3 ret
Now the program doesn't allocate stack space before puts and simply does a jmp instead of call.
I investigated further to see where exactly it jumps when calling puts.
In the no-tail-call case, the called address 0x404010 in the .idata section has the instruction jmp QWORD PTR [rip+0xffffffffffffffea] # 0x404000, and 0x404000 seems to contain the address to puts.
However in the tail-call case, the called address 0x404000 has 54 40 00 00 which is no meaningful instruction. The debugger says the program segfaults at 0x404003, so I'm pretty sure the program chokes trying to execute a garbage instruction.
I must be doing something wrong, but I'm not sure which, so could you explain why the tail-call case fails and how to get it work?
The problem was on golink not correctly handling tail-calls. I searched a while to make GNU ld link the program with the same options given to golink.
You can create a console-mode Windows executable by GNU ld with this command.
ld -o... --subsystem=console object-files...
--subsystem console or -subsystem=console also means the same. Use --subsystem=windows to create a GUI application.
GNU ld also handles Windows dll files, so in this case, simply giving ld a copy of msvcrt.dll from the system folder worked.

Why static string in .rodata section has a four dots prefix in GCC?

For the following code:
#include <stdio.h>
int main() {
printf("Hello World");
printf("Hello World1");
return 0;
}
the generated assembly for calling printf is as follows (64 bits):
400474: be 24 06 40 00 mov esi,0x400624
400479: bf 01 00 00 00 mov edi,0x1
40047e: 31 c0 xor eax,eax
400480: e8 db ff ff ff call 400460 <__printf_chk#plt>
400485: be 30 06 40 00 mov esi,0x400630
40048a: bf 01 00 00 00 mov edi,0x1
40048f: 31 c0 xor eax,eax
400491: e8 ca ff ff ff call 400460 <__printf_chk#plt>
And the .rodata section is as follows:
Contents of section .rodata:
400620 01000200 48656c6c 6f20576f 726c6400 ....Hello World.
400630 48656c6c 6f20576f 726c6431 00 Hello World1.
Based on the assembly code, the first call for printf has the argument with address 400624 which has a 4 byte offset from the start of .rodata. I know it skips the first 4 bytes for these 4 dots prefix here. But my question is why GCC/linker produce this prefix for string in .rodata ? I am using 4.8.4 GCC on Ubuntu 14.04. The compilation cmd is just: gcc -Ofast my-source.c -o my-program.
For starters, those are not four dots, the dot just means unprintable character. You can see in the hex dump that those bytes are 01 00 02 00.
The final program contains other object files added by the linker, which are part of the C runtime library. This data is used by code there.
You can see the address is 0x400620. You can then try to find a matching symbol, for example you can load it into gdb and use the info symbol command:
(gdb) info symbol 0x4005f8
_IO_stdin_used in section .rodata of /tmp/a.out
(Note I had a different address.)
Taking it further, you can actually find the source for this in glibc:
/* This records which stdio is linked against in the application. */
const int _IO_stdin_used = _G_IO_IO_FILE_VERSION;
and
#define _G_IO_IO_FILE_VERSION 0x20001
Which corresponds to the value you see if you account for little-endian storage.
It does not prefix the data. The .rodata can contain anything. The first four bytes are [seemingly] a string, but it just happens to link there (i.e. it's for something else). It is unrelated to your "Hello World"

Generating simple shell binary code to be copied to the stack for stack overflow

I am trying to implement the buffer overflow attack, but I need to generate instruction code in binary so I can put it on the stack to be executed. My problem is that the instructions that I am getting now have jumps to different parts of a program which becomes hard to put on the stack. So I have this simple piece of code (not the code to be exploited) that I want to put on the stack to spawn a new shell.
#include <stdio.h>
int main( ) {
char *buf[2];
buf[0] = "/bin/bash";
buf[1] = NULL;
execve(buf[0], buf, NULL);
}
The code is being compiled with gcc with the following flags:
CFLAGS = -Wall -Wextra -g -fno-stack-protector -m32 -z execstack
LDFLAGS = -fno-stack-protector -m32 -z execstack
Finally using objdump -d -S, I get the following code (parts of it) in hex:
....
....
08048320 <execve#plt>:
8048320: ff 25 08 a0 04 08 jmp *0x804a008
8048326: 68 10 00 00 00 push $0x10
804832b: e9 c0 ff ff ff jmp 80482f0 <_init+0x3c>
....
....
int main( ) {
80483e4: 55 push %ebp
80483e5: 89 e5 mov %esp,%ebp
80483e7: 83 e4 f0 and $0xfffffff0,%esp
80483ea: 83 ec 20 sub $0x20,%esp
char *buf[2];
buf[0] = "/bin/bash";
80483ed: c7 44 24 18 f0 84 04 movl $0x80484f0,0x18(%esp)
80483f4: 08
buf[1] = NULL;
80483f5: c7 44 24 1c 00 00 00 movl $0x0,0x1c(%esp)
80483fc: 00
execve(buf[0], buf, NULL);
80483fd: 8b 44 24 18 mov 0x18(%esp),%eax
8048401: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
8048408: 00
8048409: 8d 54 24 18 lea 0x18(%esp),%edx
804840d: 89 54 24 04 mov %edx,0x4(%esp)
8048411: 89 04 24 mov %eax,(%esp)
8048414: e8 07 ff ff ff call 8048320 <execve#plt>
}
8048419: c9 leave
804841a: c3 ret
804841b: 90 nop
As you can see this code is hard to copy onto the stack. execve jumps to a different part of the assembly code to be executed. Is there a way to nicely get a program which can be put compactly on the stack without too much space and branches being used?
If you want a clean assembly code without coding it yourself, do the following:
Move your code to a separate function
Pass the -O0 compilation flag to gcc in order to prevent optimizations
Compile just an object file - use gcc [input file] -o [output file]
Follow these steps, and use the assembly code generated from objdump.
Please keep in mind, you have an external dependency for execve:
8048414: e8 07 ff ff ff call 8048320 <execve#plt>
so you must either include its implementation explicitly and remove the call, or know in advance that the process you want to attack has this function in its address space, and modify the call address to match the process execve address.
Welcome to Stack Overflow (pun intended).
This is an unusual request. Are you trying to get hired by the NSA?
Compilers don't structure assembly code in a very human friendly form, obviously. And their idea of optimization might be for performance rather than for compactness. Therefore, you might consider hand-coding it in assembler, using the compiler output as a guideline to achieve the effect you're going for.
What you have there may not be the best representation of what the compiler can do to inform your investigation in any case. Put the code into a function other than main, so you can just get the minimal stack setup and argument handling necessary, and try compiling it with all the different optimization levels to see what it does.
You're probably getting some extra setup overhead for putting it in main(), because it's the main entry point to your program and has to interface with libc and the OS (just guessing), and may be setting things up for the program's operational context in general (which you could presume would have been done for whichever executable you were inserting the code into so it would be redundant).

Absolute Jumps Within Shared Object Code Unix

I have a question regarding the handling and interpretation of shared libraries.
Suppose, I build a shared object from foo.c using the command:
gcc -shared -fPIC -o libfoo.so foo.c
where foo.c consists of:
#include <stdlib.h>
#include <stdio.h>
int main(void) {
int i;
printf("this is a silly test\n");
if(i)
goto ret;
printf("hello world\n");
ret:
return 0;
}
Now, let's look at the objdump output, specifically that of foo's main:
0000000005ec <main>:
5ec: 55 push %rbp
5ed: 48 89 e5 mov %rsp,%rbp
5f0: 48 83 ec 10 sub $0x10,%rsp
5f4: 48 8d 3d 6b 00 00 00 lea 0x6b(%rip),%rdi # 666 <_fini+0xe>
5fb: e8 00 ff ff ff callq 500 <puts#plt>
600: 83 7d fc 00 cmpl $0x0,-0x4(%rbp)
604: 75 0e jne 614 <main+0x28>
606: 48 8d 3d 65 00 00 00 lea 0x65(%rip),%rdi # 672 <_fini+0x1a>
60d: e8 ee fe ff ff callq 500 <puts#plt>
612: eb 01 jmp 615 <main+0x29>
614: 90 nop
615: b8 00 00 00 00 mov $0x0,%eax
61a: c9 leaveq
61b: c3 retq
61c: 90 nop
61d: 90 nop
61e: 90 nop
61f: 90 nop
I can clearly see that calls to puts are being redirected to the PLT, as expected. However, what I don't understand are the instructions at 604 and 612. They are not relative to the IP, nor a call to the PLT. They use an absolute address, based on the
symbol main.
How could this shared library then possibly be used simultaneously betwen several processes? It could (and should) be loaded at different virtual addresses, but the point is that each process should share the implementation stored in RAM. How can different processes with main loaded at different virtual addresses share the instructions at 604 and 612?
They're not absolute jumps, they're PC relative jumps. In fact ALL direct jump and call instructions are PC-relative on x86 -- there are NO absolute direct jumps (so if you want an absolute jump, it has to be indirect).
The reason the callq instruction use the PLT is because the target symbol might be in a different shared object, so even a relative branch won't work (the other shared object might be loaded at any address, independently of this shared object). The PLT itself is actually a small piece of code within the shared object, with a single (indirect) absolute jump for each remote symbol. When the shared objects a dynamically loaded, the absolute address is set appropriately, so when the code runs, the callq instruction will be a pc-relative branch (call) to the PLT which consists of a single indirect jump to the puts routine.

Resources