Disassemble, modify, and reassemble executable - c

How can I disassemble an executable on my mac using ndisasm and reassemble and link it using nasm and ld?
This is what I tried (I'm running MacOS X btw):
*ndisasm a.out | cut -c 29- > main.asm*
this generated clean assembler code with all the processor instructions in main.asm
*nasm -f macho main.asm*
this generated an object file main.o which I then tried to link
*ld main.o*
... this is where I'm stuck. I don't know why it generates the following error:
ld: in section __TEXT,__text reloc 0: R_ABS reloc but no absolute symbol at target adress file 'main.o' for inferred architecture i386.
I also tried specifying the architecture (ld -arch x86_64 main.o) but that didn't work either.
My goal is to disassemble any executable, modify it and then reassemble it again.
What am I doing wrong?

There is no reliable way to do this with normal assembler syntax. See How to disassemble, modify and then reassemble a Linux executable?. Section info is typically not faithfully disassembled, so you'd need a special format designed for modify and reassembling + relinking.
Also, instruction-lengths are a problem when code only works when padded by using longer encodings. (e.g. in a table of jump targets for a computed goto). See Where are GNU assembler instruction suffixes like ".s" in x86 "mov.s" documented?, but note that disassemblers don't support disassembling into that format.
ndisasm doesn't understand object file formats, so it disassembles headers as machine code!
For this to have any hope of working, use a disassembler like Agner Fog's objconv which will output asm source (NASM, MASM, or GAS AT&T) which does assemble. It might not actually work if any of the code depended on a specific longer-than-default encoding.
I'm not sure how faithful objconv is with respect to emitting section .bss, section .rodata and other directives like that to place data where it found it in the object file, but that's what you need.
Re: absolute relocations: make sure you put DEFAULT REL at the top of your file. I forget if objconv does this by default. x86-64 Mach-o only supports PC-relative relocations, so you have to create position-independent code (e.g. using RIP-relative addressing modes).
ndisasm doesn't read the symbol table, so all its operands use absolute addressing. objconv makes up label names for jump targets and static data that doesn't appear in the symbol table.

Related

Is it possible in practice to compile millions of small functions into a static binary?

I've created a static library with about 2 million small functions, but I'm having trouble linking it to my main function, using GCC (tested 4.8.5 or 7.3.0) under Linux x86_64.
The linker complains about relocation truncations, very much like those in this question.
I've already tried using -mcmodel=large, but as the answer to that same question says, I would
"need a crt1.o that can handle full 64-bit addresses". I've then tried compiling one, following this answer, but recent glibc won't compile under -mcmodel=large, even if libgcc does, which accomplishes nothing.
I've also tried adding the flags -fPIC and/or -fPIE to no avail. The best I get is this sole error:
ld: failed to convert GOTPCREL relocation; relink with --no-relax
and adding that flag also doesn't help.
I've searched around the Internet for hours, but most posts are very old and I can't find a way to do this.
I'm aware this is not a common thing to try, but I think it should be possible to do this. I'm working in an HPC environment, so memory or time constraints are not the issue here.
Has anyone been successful in accomplishing something similar with a recent compiler and toolchain?
Either don't use the standard library or patch it. As for the 2.34 version, Glibc doesn't support the large code model. (See also Glibc mailing list and Redhat Bugzilla)
Explanation
Let's examine the Glibc source code to understand why recompiling with -mcmodel=large accomplished nothing. It replaced the relocations originating from C files. But Glibc contained hardcoded 32-bit relocations in raw Assembly files, such as in start.S (sysdeps/x86_64/start.S).
call *__libc_start_main#GOTPCREL(%rip)
start.S emitted R_X86_64_GOTPCREL for __libc_start_main, which used relative addressing. x86_64 CALL instruction didn't support relative jumps by more than 32-bit displacement, see AMD64 Manual 3. So, ld couldn't offset the relocation R_X86_64_GOTPCREL because the code size surpassed 2GB.
Adding -fPIC didn't help due to the same ISA constraints. For position-independent code, the compiler still generated relative jumps.
Patching
In short, you have to replace 32-bit relocations in the Assembly code. See System V Application Binary Interface AMD64 Architecture Process Supplement for more info about implementing 64-bit relocations. See also this for a more in-depth explanation of code models.
Why don't 32-bit relocations suffice for the large code model? Because we can't rely on other symbols being in a range of 2GB. All calls must become absolute. Contrast with the small PIC code model, where the compiler generates relative jumps whenever possible.
Let's look closely at the R_X86_64_GOTPCREL relocation. It contains the 32-bit difference between RIP and the symbol's GOT entry address. It has a 64-bit substitute — R_X86_64_GOTPCREL64, but I couldn't find a way to use it in Assembly.
So, to replace the GOTPCREL, we have to compute the symbol entry GOT base offset and the GOT address itself. We can calculate the GOT location once in the function prologue because it doesn't change.
First, let's get the GOT base (code lifted wholesale from the ABI Supplement). The GLOBAL_OFFSET_TABLE relocation specifies the offset relative to the current position:
leaq 1f(%rip), %r11
1: movabs $_GLOBAL_OFFSET_TABLE_, %r15
leaq (%r11, %r15), %r15
With the GOT base residing on the %r15 register, now we have to find the symbol's GOT entry offset. The R_X86_64_GOT64 relocation specifies exactly this. With this, we can rewrite the call to __libc_start_main as:
movabs $__libc_start_main#GOT, %r11
call *(%r11, %r15)
We replaced R_X86_64_GOTPCREL with GLOBAL_OFFSET_TABLE and R_X86_64_GOT64. Replace others in the same vein.
N.B.: Replace R_X86_64_GOT64 with R_X86_64_PLTOFF64 for functions from dynamically linked executables.
Testing
Verify the patch correctness using the following test that requires the large code model. It doesn't contain a million small functions, having one huge function and one small function instead.
Your compiler must support the large code model. If you use GCC, you'll need to build it from the source with the flag -mcmodel=large. Startup files shouldn't contain 32-bit relocations.
The foo function takes more than 2GB, rendering 32-bit relocations unusable. Thus, the test will fail with the overflow error if compiled without -mcmodel=large. Also, add flags -O0 -fPIC -static, link with gold.
extern int foo();
extern int bar();
int foo(){
bar();
// Call sys_exit
asm( "mov $0x3c, %%rax \n"
"xor %%rdi, %%rdi \n"
"syscall \n"
".zero 1 << 32 \n"
: : : "rax", "rdx");
return 0;
}
int bar(){
return 0;
}
int __libc_start_main(){
foo();
return 0;
}
int main(){
return 0;
}
N.B. I used patched Glibc startup files without the standard library itself, so I had to define both _libc_start_main and main.

How do I link an object file generated from C code, a static library and a NASM generated object file?

I am working on a program (for real real mode) that is loaded by a bootloader to an address in memory and jumps to it and starts executing the program. The problem is that I have the project separated into two files: a.asm (16bit asm, NASM syntax) and b.c (which i compile with gcc for dos (djgpp)). Also, b.c uses some functions from the allegro library (I have it as a static library, .a).
My question is, how do I compile and link these 3 files together? My first thought was to:
Compile and assemble b.c with gcc (with the -c flag), as a result I get a b.o file
Assemble a.asm with NASM (-fbin or.. ?) and get a.o
Link b.o, a.o and allegro.a to get a pure binary (no .exe headers,
no debug information etc.)
I tried the above approach, but at the step 3, the linker throws an error saying that the format of a.o (the object file generated by NASM), is unrecognized, and that may be because either I am not invoking the right flags and options when assembling the file, or..
I would like some guidance on how to approach this problem.
Thanks.
The .o file generated by DJGPP contains 32-bit (i386) code, which cannot be called from 16-bit code directly.
Under DOS, 32-bit code is typically run by using a DOS extender, which switches to 32-bit protected mode, sets up memory mappings and DOS API translation (i.e. small trampoline functions which switch back to 16-bit real mode when calling the int 21h DOS API). and then loads and calls the 32-bit code.
Lightweight alternatives of DOS extenders for switching between 16-bit and 32-bit mode:
unreal mode with gcc -m16 (.code16gcc). See this answer and other answers more details about gcc -m16.
The bootloader of the Syslinux project, which contains 16-bit assembly (NASM), 32-bit assembly (NASM) and 32-bit C (GCC) code, and it switches between them.
To link 16-bit and 32-bit code together, you can run objcopy -O binary func.o func.bin (32-bit), and then add %incbin "func.bin" to your 16-bit NASM source file. However, this breaks relocations (so you won't be able to use global variables).

Script/Tool predicate for ARM ELF compiled for Thumb OR Arm

I have rootfs and klibc file systems. I am creating make rules and some developers have an older compiler without inter-networking.note1 I am trying to verify that all the files get built with arm only when a certain version of the compiler is detected. I have re-built the tree's several times. I was using readelf -A and looking for Tag_THUMB_ISA_use: Thumb-1, but this seem to be in arm only code (but was built with the interworking compiler) as well as thumb code. I can manually run objdump -S and examine the assembler to determine what instruction set is in use.
However, it would be much easier if I had a script/tool predicate so that find, etc can be used to search through the shadow file systems to look for binaries that may have been missed. I thought that some of this information would be in the ELF header and accessible via objdump or readelf, but I haven't found anything reliable.
Specifically I am looking for,
Compiled 'C' that wouldn't run without a CONFIG_ARM_THUMB Linux system.
make rules that use 'C' compiler flags that choke a non-thumb compilers.
note1: Interworking allow easy switching between thumb and arm modes, and the compiler will automatically generate code to support calling from either mode.
The readelf -A output doesn't describe the elf contents. It just describes the capabilities of the processor and or system that is expected or fed to the compiler. As I have an ARM926 CPU which is an ARMV5TEJ processor, gcc/ld will always set Tag_THUMB_ISA_use: Thumb-1 as it just means that ARMV5TEJ is recognized as being Thumb-1 capable. It says nothing about the code itself.
Examining the Linux arch/arm/kernel/elf.c routine elf_check_arch() shows a check for x->e_entry & 1. This leads to the following script,
readelf -h $1 | grep -q Entry.*[13579bdf]$
Ie, just look at the initial ELF entry value and see if the low bit is set. This is a fast check that fits the spirit of what I am looking for. unixsmurf has a good point that the code inside any ELF can mix and match ARM and Thumb. This maybe ok, if the program dynamically ids the CPU and selects an appropriate routine. Ie, just the presence of a Thumb instruction doesn't mean that code will execute.
Just looking at the entry value does determine which gcc compiler flags were used, at least for gcc versions 4.6 to 4.7.
Since thumb and arm sequences can be freely interchanged within an object file, even within the same section, plain ELF header inspection is not going to help you whether a file includes Thumb instructions or not.
A slightly roundabout and still not 100% foolproof way would be to use readelf -r and check if the output contains "R_ARM_THM", indicating a relocation for thumb.

Difference in position-independent code: x86 vs x86-64

I was recently building a certain shared library (ELF) targeting x86-64 architecture, like this:
g++ -o binary.so -shared --no-undefined ... -lfoo -lbar
This failed with the following error:
relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC
Of course, it means I need to rebuild it as position-independent code, so it's suitable for linking into a shared library.
But this works perfectly well on x86 with exactly the same build arguments. So the question is, how is relocation on x86 different from x86-64 and why don't I need to compile with -fPIC on the former?
I have found a nice and detailed explanation, which boils down to:
x86-64 uses IP-relative offset to load global data, x86-32 cannot, so it dereferences a global offset.
IP-relative offset does not work for shared libraries, because global symbols can be overridden, so x86-64 breaks down when not built with PIC.
If x86-64 built with PIC, the IP-relative offset dereference now yields a pointer to GOT entry, which is then dereferenced.
x86-32, however, already uses a dereference of a global offset, so it is turned into GOT entry directly.
It is a code model issues. By default, static code is build assuming the whole program will stay in the lower 2G part of the memory address space. Code for shared libraries need to be compiled for another memory model, either PIC, or with -mcmodel=large which will compile without making that assumption.
Note that -mcmodel=large is not implemented in older gcc version (it is in 4.4, it isn't in 4.2, I don't know for 4.3).
.
It's purely an arbitrary requirement the ABI people have imposed on us. There's no logical reason why the dynamic linker on x86_64 couldn't support non-PIC libraries. However, since x86_64 is not under such horrible register pressure as x86 (and has better features for PIC), I don't know of any significant reason not to use PIC.

GCC, ARMboot - Creating standalone application without any library and any OS

I have an embedded hardware system which contains a bootloader based on ARMboot (which is very similar to Uboot and PPCboot).
This bootloader normally serves to load uClinux image from the flash. However, now I am trying to use this bootloader to run a standalone helloworld application, which does not require any linked library. Actually, it contains only while(1){} code in the main function.
My problem is that I cannot find out what GCC settings should I use in order to build a standalone properly formatted binary.
I do use following build command:
cr16-elf-gcc -o helloworld helloworld.c -nostdlib
which produces warning message:
warning: cannot find entry symbol _start; defaulting to 00000004
Thereafter, within the bootloader, I upload a produced application and start it at some address:
tftpboot 0xa00000 helloworld
go 0xa00004
But it doesn't work :(
The system reboots.
Normally it should just hang (because of while(1)).
I don't know that loader, but I think you should use objcopy like this to dump your executable data to a raw binary file. Don't jump to ELF headers, people :)
objcopy -O binary ./a.out o.bin
Also try to compile position independent code and to read ld and gcc manuals.
The linker is complaining about missing startup code.
You need to provide two things: startup code and a linker command file that defines the address map of your target processor.
In your case the startup code is as "bl main", but usually the startup code will initialize the stack pointer at least before branching to main.
If you know you are loading your example into RAM, you can start your program at main directly. You'll need to determine main()'s address ate use that for your "go" command.
I operate on the ARM non-os non-lib all day every day. This is my current gcc options:
arm-whatever-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -c hello.c -o hello.o
then I use the linker to combine the C code with the vector tables and such, even if it is not an image that needs a vector table using a vector table makes it easy to put your entry point on the first instruction.
Any reason you can't statically link at least the standard libraries in? You should have a working program and the benefits of the standard libraries without external dependencies.
Also, does your toolchain/IDE provide differentiate between "standalone application" and "linux application"? The IDE for the AVR32 has that distinction and is able to generate either a program that runs within the embedded linux environment or a standalone program that basically becomes the OS.

Resources