When I link together object files, the resulting ELF executable has (only) the following LOAD segment:
LOAD off 0x00000000 vaddr 0x00008000 paddr 0x00008000 align 2**15
filesz 0x000010f0 memsz 0x000010f0 flags rwx
The linker ld combined all sections into one rwx segment, instead of separating writable and executable parts. I want to prevent this. The relocatable objects have the sections marked as writable or executable as appropriate, so this problem appears at the link time.
So, what determines how ld assigns permissions to the segments? The linker script does not seem to have anything related to that. Is it something specified when the toolchain is built?
I am targeting the ARM, and the toolchain is arm-linux-gnueabi, binutils version 2.22.
Edit: The linker script is here. The other linker options are -Bdynamic, --gc-sections, -z nocopyreloc and --no-undefined.
Usually linker scripts are used to define sections, their order, whether to keep them or trash them from the binary (release build linker script might chose to remove debugging info).
Here is an example of a linker script:
system-onesegment.ld
EDIT: to modify section permission, put this before whatever section you want to change the permission of .section .section_name, "permission".
Example:
.text
mov r0, r0
b main
.section .rodata, "ro"
.word 0x00000001
.previous
main: mov r0, r0
Related
So I am trying to write some code using x86 and I can't seem to get it to move contents of a register to a spot in memory.
The code is just this
global main
SECTION .DATA
var_i: DD 0
SECTION .TEXT
main:
push DWORD 4
pop EAX
mov [var_i], EAX
mov EAX, 0
ret
I am using nasm and gcc on the code.
The problem I am having is that whenever I try to move to the spot in memory it segfaults
What kind of system/object format are you using? I'm guessing you're using ELF on Linux or Unix, as that would explain your problem:
Section names in ELF are case sensitive, and most ELF-based OS's the special sections .text and .data are understood, but your sections .TEXT and .DATA have no meaning. As a result, they just get stuck into the executable after the other sections and get the same access permissions. If you're just linking the above code, that will be after the .fini section, so it will executable and read-only. So when you try to write to the variable, you get a segfault.
Change your code to use .data and .text as section names and it should work.
I'm trying to generate a static executable for this program (with musl):
main.S:
.section .text
.global main
main:
mov $msg, %rdi
mov $0, %rax
call printf
mov %rax, %rdi
mov $60, %rax
syscall
msg:
.ascii "hello world from printf\n\0"
Compilation command:
clang -g -c main.S -o main.o
Linking command (musl libc is placed in musl directory (version 1.2.1)):
ld main.o musl/crt1.o -o sm -Tstatic.ld -static -lc -lm -Lmusl
Linker script (static.ld):
ENTRY(_start)
SECTIONS
{
. = 0x100e8;
}
This config results in a working executable, but if I change the location counter offset to 0x10000 or 0x20000, the resulting executable crashes during startup with a segfault. On debugging I found that musl initialization code tries to read the program headers (location received in aux vector), and for some reason the memory address of program header as given by aux vector is unmapped in our address space.
What is the cause of this behavior? What exactly is the counter offset in a linker script? How does it affect the linker output other than altering the load address?
Note: The segfault occurs when the the musl initialization code tries to access program headers
There are a few issues here.
Your main.S has a stack alignment bug: on x86_64, you must realign the stack to 16-byte boundary before calling any other function (you can assume 8-byte alignment on entry).
Without this, I get a crash inside printf due to movaps %xmm0,0x40(%rsp) with misaligned $rsp.
Your link order is wrong: crt1.o should be linked before main.o
When you don't leave SIZEOF_HEADERS == 0xe8 space before starting your .text section, you are leaving it up to the linker to put program headers elsewhere, and it does. The trouble is: musl (and a lot of other code) assumes that the file header and program headers are mapped in (but the ELF format doesn't require this). So they crash.
The right way to specify start address:
ENTRY(_start)
SECTIONS
{
. = 0x10000 + SIZEOF_HEADERS;
}
Update:
Why does the order matter?
Linkers (in general) will assemble initializers (constructors) left to right. When you call standard C library routines from main(), you expect the standard library to have initialized itself before main() was called. Code in crt1.o is responsible for performing such initialization.
If you link in the wrong order: crt1.o after main.o, construction may not happen correctly. Whether you'll be able to observe this depends on implementation details of the standard library, and exactly what parts of it you are using. So your binary may appear to work correctly. But it is still better to link objects in the correct order.
I'm leaving 0x10000 space, isn't it enough for headers?
You are interfering with the built-in default linker script, and instead giving it incomplete specification of how to lay out your program in memory. When you do so, you need to know how the linker will react. Different linkers will react differently.
The binutils ld reacts by not emitting a LOAD segment covering program headers. The ld.lld reacts differently -- it actually moves .text past program headers.
The resulting binaries still crash though, because the binary layout is not what the kernel expects, and the kernel-supplied AT_PHDR address in the aux vector is wrong.
It looks like the kernel expects the first LOAD segment to be the one which contains program headers. Arguably that is a bug in the kernel -- nothing in the ELF spec requires this. But all normal binaries do have program headers in the first LOAD segment, so you'll just have to do the same (or convince kernel developers to add code to handle your weird binary layout).
I'm using a cross compiler on a 64-bit Intel-based Linux system to build some of our software so it can run on a 32-bit PowerPC chip. The cross compiler was produced by Crosstools.
When I run "readelf -a" against the shared object files (.so files) produced by the cross compiler, part of the output shows this:
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00000000 0x00000000 0x9a87c 0x9a87c R E 0x10000
LOAD 0x09a87c 0x000aa87c 0x000aa87c 0x01344 0x03230 RWE 0x10000
DYNAMIC 0x09ba84 0x000aba84 0x000aba84 0x000d0 0x000d0 RW 0x4
GNU_EH_FRAME 0x09a7bc 0x0009a7bc 0x0009a7bc 0x0002c 0x0002c R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
The problem is that header marked RWE. A potential customer evaluating our software has issues with that and wants it to be just RW.
A second cross compiler, produced by the same version of Crosstools and targeting the same version of gcc, produces code for 64-bit PowerPC chips. The shared object files produced by this cross compiler do not produce any RWE headers (the second LOAD header is marked just RW).
The qualifiers for gcc for both the compiles and the links are the same in both cases.
I'm kind of new to the world of cross compilers and ELF headers. Is there a way to get the 32-bit cross compiler to create shared object files without a header marked RWE?
Failing that, is there a way to (safely) patch an already created .so file to change a header marked RWE so it's marked RW?
Can you check the section-to-program-header mapping (eu-readelf -l)? I'm pretty sure this happens because your target configuration doesn't enable by default the secure GOT feature that was implemented for GNU/Linux some time ago:
powerpc new PLT and GOT
See --bss-plt and --secure-plt in ld and PowerPC 32-bit ELF Support for flags to control this behavior. If you use a custom program loader, you may have to tweak it for secure PLT support. The glibc dynamic linker supports it.
(And your customer is very astute, congratulations.)
EDIT If --secure-plt does not work, your PowerPC sub-target is either incompatible with it, you built the toolchain incorrectly, or there's a linker script which overrides its effect.
I don't have much experience with linker scripts, so maybe I'm just misunderstanding something here. The LD linker script for the ATmega32u4 (avr5.x) specifies that data memory (SRAM) starts at 0x800060. I know that 0x800000 is just a special offset that the compiler uses for SRAM pointers, but why use 0x60 as the base of RAM? The ATmega32u4 datasheet shows that 0x60 is the start of external I/O registers and that SRAM actually starts at 0x100. Doesn't that mean the external I/O registers will be clobbered when the .data section is copied into SRAM?
Apparently, this is something GCC accounts for if you invoke avr-ld through it. GCC overrides the virtual address of the .data section depending on what you passed to -mmcu=. If you link with avr-ld directly, the .data section is at the default 0x800060.
It is possible to move some of the functions in the code in a specific section
on the executable? If so, how?
For an application compiled with gcc, we have more source files, including
X.c. Each object is compiled from the associated source (X.o is obtained from X.c) and the linker produces a big executable.
I need two functions from X.c to be in a specific section in the
executable, say .magic_section. The reason I want this is
that the section will be loaded in another area of memory than the rest of the sections.
My problem is that I can not change the source X.c, otherwise I would have used
a specific flag, such as __attribute__ ((section ("magic_section"))) for
the functions.
I read something in the documentation for the linker and wrote a custom script for the linker, but I failed to specify in which section a particular symbol must be placed. I only managed to move a whole section.
On way you could do probably do it (not great, but should work in theory) is to use --function-sections and --data-sections, assuming your GCC version / architecture supports those options, and then manually call out all the functions & variables that need to go in a given file with a linker script.
This creates sections called like things .text.function_name or .data.variable_name. If you're familiar with assigning sections via gcc attributes, I'll assume you know what to do in the linker.
As an advantage, that would let you cherry-pick functions if you don't actually want the entire file to go in a magic section.
Unfortunately, without modifying your binary objects, dynamic linker or dynamic loader you will not be able to accomplish this, and anyhow, this is a very difficult task.
Option 1 - ELF manipulation
Each ELF executable is made from sections, which contain the actual code/data/symbol strings/... and segments which help the loader decide things like where to load your code in memory, which symbols this ELF exposes, which symbols it requires from other locations, where to load specific code/data, etc.
You can observe the segments in your binary by typing
readelf -l [your binary]
The output will be similiar to the following (I chose ls as the binary):
[ishaypeled#ishay-dev bin]$ readelf -l --wide ./ls
Elf file type is EXEC (Executable file)
Entry point 0x4048bf
There are 9 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000400040 0x0000000000400040 0x0001f8 0x0001f8 R E 0x8
INTERP 0x000238 0x0000000000400238 0x0000000000400238 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x01b694 0x01b694 R E 0x200000
LOAD 0x01bdf0 0x000000000061bdf0 0x000000000061bdf0 0x000864 0x0016d0 RW 0x200000
DYNAMIC 0x01be08 0x000000000061be08 0x000000000061be08 0x0001f0 0x0001f0 RW 0x8
NOTE 0x000254 0x0000000000400254 0x0000000000400254 0x000044 0x000044 R 0x4
GNU_EH_FRAME 0x01895c 0x000000000041895c 0x000000000041895c 0x00071c 0x00071c R 0x4
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10
GNU_RELRO 0x01bdf0 0x000000000061bdf0 0x000000000061bdf0 0x000210 0x000210 R 0x1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
03 .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag .note.gnu.build-id
06 .eh_frame_hdr
07
08 .init_array .fini_array .jcr .dynamic .got
Now let's examine this output:
In the first table (Program Headers):
[Type] - Segment type, what is the purpose of this section
[Offset] - Offset in file where this segment begins
[VirtAddr] - Where we want to load this section in process address space (if this segment should be loaded at all, not all of them are loaded)
[PhysAddr] - Same as VirtAddr for all modern OS's I encountered
[FileSiz] - How big is this section on file. This is the link to your sections - the current segment consists of all sections in the range Offset to Offset+FileSiz
[MemSiz] - How big is this section in virtual memory (this does NOT have to be the same as the size on file! if it spans beyond the size in file the excess is set to 0)
[Flg] - Permission flags, R-read E-execute W-write.
[Align] - Required memory alignment in memory.
Your focus is on segments of type LOAD (PT_LOAD). These segments group data from sections, instruct the loader where to put them in the process address space and determine specify their permissions.
You can see a convenient section to segment mapping in the Section to Segment mapping table.
Lets observe the two LOAD segments 2 and 3:
We can see that segment 2 has read and execute permissions, and that it spans (among other) the .text and .rodata sections.
So, to achieve your purpose using ELF manipulation:
Locate the binary data that makes your functions in the file (readelf utility is your friend)
By modifying the ELF header (I don't know any tool that does this automatically, you'd probably have to write your own) split the segment containing .text section into two sequential LOAD segments, leaving out your function code
By modifying the ELF header create a new LOAD segment containing only your two functions
Update all references (if any) to the old function location to the new one
If you read up to here and understood everything, you should know this is a tremendously tedious, nearly impossible task for real life cases.
Option 2 - Dynamic linker manipulation
Note the INTERP segment type in the above example. This is an ASCII string that specifies which dynamic linker you should use.
The dynamic linker role is to parse the segments and perform all dynamic operations such as resolving symbols at runtime, loading segments from .so file, etc.
A possible manipulation here would be to modify the dynamic linker code (NOTE: this is a system wide change!) to load the functions binary data into a specific memory address in the process address space. Note that this approach has a couple of set backs:
It requires modification to the dynamic linker
You need to update all references to your functions within the ELF file still
Option 3 - Dynamic loader manipulation
Much like option 2, but modify the ld library facilities instead of the dynamic linker.
Conclusion
Exactly what you wish to do is very hard, and indeed a tedious task. I am working on a tool that allows injection of arbitrary functions into existing shared object files at the moment and I guarantee this to be at least a few good weeks of work.
Are you sure there isn't another way to achieve what you want? Why do you need these two functions in a separate address? Perhaps there is an easier solution...