I am trying to modify fs/binfmt_elf.c to fetch the content of a new ELF section I have added to my sample C program.
Sample C program:
#include <stdio.h>
/* Adding my own ELF section*/
char my_custom_section[128][2] __attribute__ ((section (".mysection"))) = { 0 };
int main() {
return 0;
}
Here's the output of readelf -l a.out:
Elf file type is EXEC (Executable file)
Entry point 0x4003d0
There are 9 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000001f8 0x00000000000001f8 R E 8
INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x000000000000067c 0x000000000000067c R E 200000
LOAD 0x0000000000000e28 0x0000000000600e28 0x0000000000600e28
0x00000000000002f8 0x0000000000000308 RW 200000
DYNAMIC 0x0000000000000e50 0x0000000000600e50 0x0000000000600e50
0x0000000000000190 0x0000000000000190 RW 8
NOTE 0x0000000000000254 0x0000000000400254 0x0000000000400254
0x0000000000000044 0x0000000000000044 R 4
GNU_EH_FRAME 0x00000000000005ac 0x00000000004005ac 0x00000000004005ac
0x000000000000002c 0x000000000000002c R 4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 8
GNU_RELRO 0x0000000000000e28 0x0000000000600e28 0x0000000000600e28
0x00000000000001d8 0x00000000000001d8 R 1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
03 .ctors .dtors .jcr .dynamic .got .got.plt .data .mysection .bss
04 .dynamic
05 .note.ABI-tag .note.gnu.build-id
06 .eh_frame_hdr
07
08 .ctors .dtors .jcr .dynamic .got
So .mysection is assigned to segment 03 of type LOAD.
However, going through the source of fs/binfmt.c, I am unable to understand how exactly can I fetch the content/value of .mysection. It seems that load_elf_binary() only iterates through program headers and segments, and not section headers or sections.
(which presumably makes sense because it's a loader and hence, views an ELF file as segments and not as sections).
But even when it iterates through the program segments, I don't understand how is it iterating through the sections within a segment.
Basically, I need to fetch the value of my own section .mysection, through modifying fs/binfmt.c, so that whenever an ELF executable with .mynewsection is loaded in my kernel, its value is stored in some variable/CPU registers. Any pointers which can help in doing this job are much appreciated.
EDIT:
My primary goal is to extend the ELF file format into my own custom file format, which I plan to use for a research project on extending the x86 architecture. The executables in that extended architecture need to have a unique identity key used to perform cryptographic operations on its code/data segments. Therefore, a new section in the file could store that key, and the kernel loader can fetch it for further cryptographic processes.
But even when it iterates through the program segments, I don't
understand how is it iterating through the sections within a segment.
Since load_elf_binary() mmaps all the PT_LOAD segments into memory, and those segments contain all the needed sections (e. g. segment 03 contains sections .ctors .dtors .jcr .dynamic .got .got.plt .data .mysection .bss), there is no need for iterating through the sections.
Related
I made this simple C program and compiled it without ASLR
#include <stdio.h>
#include <stdlib.h>
int a = 10;
int b = 20;
int main(int argc, char *argv[])
{
printf("%lx\n",&a);
printf("%lx\n",&b);
return 0;
}
Every time I execute it, the result is the same:
555555558018
55555555801c
Because of that, I am thinking that the data section should start somewhere near to 0x555555558018.
However, when I list the segments of my binary I see the following:
Elf file type is DYN (Position-Independent Executable file)
Entry point 0x1050
There are 13 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x00000000000002d8 0x00000000000002d8 R 0x8
INTERP 0x0000000000000318 0x0000000000000318 0x0000000000000318
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000618 0x0000000000000618 R 0x1000
LOAD 0x0000000000001000 0x0000000000001000 0x0000000000001000
0x0000000000000195 0x0000000000000195 R E 0x1000
LOAD 0x0000000000002000 0x0000000000002000 0x0000000000002000
0x00000000000000e4 0x00000000000000e4 R 0x1000
LOAD 0x0000000000002dd0 0x0000000000003dd0 0x0000000000003dd0
0x0000000000000250 0x0000000000000258 RW 0x1000
DYNAMIC 0x0000000000002de0 0x0000000000003de0 0x0000000000003de0
0x00000000000001e0 0x00000000000001e0 RW 0x8
NOTE 0x0000000000000338 0x0000000000000338 0x0000000000000338
0x0000000000000020 0x0000000000000020 R 0x8
NOTE 0x0000000000000358 0x0000000000000358 0x0000000000000358
0x0000000000000044 0x0000000000000044 R 0x4
GNU_PROPERTY 0x0000000000000338 0x0000000000000338 0x0000000000000338
0x0000000000000020 0x0000000000000020 R 0x8
GNU_EH_FRAME 0x000000000000200c 0x000000000000200c 0x000000000000200c
0x000000000000002c 0x000000000000002c R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
GNU_RELRO 0x0000000000002dd0 0x0000000000003dd0 0x0000000000003dd0
0x0000000000000230 0x0000000000000230 R 0x1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.gnu.property .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
03 .init .plt .plt.got .text .fini
04 .rodata .eh_frame_hdr .eh_frame
05 .init_array .fini_array .dynamic .got .got.plt .data .bss
06 .dynamic
07 .note.gnu.property
08 .note.gnu.build-id .note.ABI-tag
09 .note.gnu.property
10 .eh_frame_hdr
11
12 .init_array .fini_array .dynamic .got
There is not such an address. I think that maybe there is a difference between C pointers (I observe that they consist of 48 bits), and the Virtual Addresses of the segments (that consist of 64 bits). Where are the C pointers actually pointing to?
I have different hypothesis. Once I read that C pointers are actually offsets of their segments (not sure if this is true). The other thing I can think about, is that C pointers are logical addresses, while the segment's virtual addresses refer to the Linear Address Space. See the difference below:
Memory in x86
I'm trying to build a xv6-like system and I'm copying xv6's code below:
if (ph.p_vaddr % PGSIZE) {
cprintf("exec: addr not page aligned.\n");
goto bad;
}
This is the part where ELF is loaded into memory.
It checks every PT_LOAD segment's vaddr and makes sure it's page aligned before load it into the memory.
But the code is confusing because when I use readelf to check my ELF file to load:
Elf file type is EXEC (Executable file)
Entry point 0x400260
There are 4 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x0021f0 0x0021f0 R E 0x1000
LOAD 0x002eb8 0x0000000000403eb8 0x0000000000403eb8 0x000268 0x0008e8 RW 0x1000
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10
GNU_RELRO 0x002eb8 0x0000000000403eb8 0x0000000000403eb8 0x000148 0x000148 R 0x1
Section to Segment mapping:
Segment Sections...
00 .init .text .fini .rodata .eh_frame
01 .init_array .fini_array .data.rel.ro .got .got.plt .data .bss
02
03 .init_array .fini_array .data.rel.ro .got .got.plt
The vaddr is not always page aligned, so I'd like to ask that is it something wrong with my way of compiling or the code itself is wrong?
is it something wrong with my way of compiling or the code itself is wrong?
The code is wrong. .p_vaddr - .p_offset must be page-aligned; .p_vaddr alone does not have to be.
This is because the segment needs to be mmaped, and mmap requires page-aligned offset. In order to mmap the second segment at .p_vaddr, the loader rounds down both .p_vaddr and .p_offset, and mmaps a bit of extra at the beginning of the segment.
There is a remote 64-bit *nix server that can compile a user-provided code (which should be written in Rust, but I don't think it matters since it uses LLVM). I don't know which compiler/linker flags it uses, but the compiled ELF executable looks weird - it has 4 LOAD segments:
$ readelf -e executable
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
...
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000004138 0x0000000000004138 R 0x1000
LOAD 0x0000000000005000 0x0000000000005000 0x0000000000005000
0x00000000000305e9 0x00000000000305e9 R E 0x1000
LOAD 0x0000000000036000 0x0000000000036000 0x0000000000036000
0x000000000000d808 0x000000000000d808 R 0x1000
LOAD 0x0000000000043da0 0x0000000000044da0 0x0000000000044da0
0x0000000000002290 0x00000000000024a0 RW 0x1000
...
On my own system all executables that I was looking at only have 2 LOAD segments:
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
...
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x00000000003000c0 0x00000000003000c0 R E 0x200000
LOAD 0x00000000003002b0 0x00000000005002b0 0x00000000005002b0
0x00000000000776c8 0x000000000009b200 RW 0x200000
...
What are the circumstances (compiler/linker versions, flags etc) under which a compiler might build an ELF with 4 LOAD segments?
What is the point of having 4 LOAD segments? I imagine that having a segment with read but not execute permission might help against certain exploits, but why have two such segments?
A typical BFD-ld or Gold linked Linux executable has 2 loadable segments, with the ELF header merged with .text and .rodata into the first RE segment, and .data, .bss and other writable sections merged into the second RW segment.
Here is the typical section to segment mapping:
$ echo "int foo; int main() { return 0;}" | clang -xc - -o a.out-gold -fuse-ld=gold
$ readelf -Wl a.out-gold
Elf file type is EXEC (Executable file)
Entry point 0x400420
There are 9 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000400040 0x0000000000400040 0x0001f8 0x0001f8 R 0x8
INTERP 0x000238 0x0000000000400238 0x0000000000400238 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x0006b0 0x0006b0 R E 0x1000
LOAD 0x000e18 0x0000000000401e18 0x0000000000401e18 0x0001f8 0x000200 RW 0x1000
DYNAMIC 0x000e28 0x0000000000401e28 0x0000000000401e28 0x0001b0 0x0001b0 RW 0x8
NOTE 0x000254 0x0000000000400254 0x0000000000400254 0x000020 0x000020 R 0x4
GNU_EH_FRAME 0x00067c 0x000000000040067c 0x000000000040067c 0x000034 0x000034 R 0x4
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10
GNU_RELRO 0x000e18 0x0000000000401e18 0x0000000000401e18 0x0001e8 0x0001e8 RW 0x8
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .dynsym .dynstr .gnu.hash .hash .gnu.version .gnu.version_r .rela.dyn .init .text .fini .rodata .eh_frame .eh_frame_hdr
03 .fini_array .init_array .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag
06 .eh_frame_hdr
07
08 .fini_array .init_array .dynamic .got .got.plt
This optimizes the number of mmaps that the kernel must perform to load such executable, but at a security cost: the data in .rodata shouldn't be executable, but is (because it's merged with .text, which must be executable). This may significantly increase the attack surface for someone trying to hijack a process.
Newer Linux systems, in particular using LLD to link binaries, prioritize security over speed, and put ELF header and .rodata into the first R-only segment, resulting in 3 load segments and improved security. Here is a typical mapping:
$ echo "int foo; int main() { return 0;}" | clang -xc - -o a.out-lld -fuse-ld=lld
$ readelf -Wl a.out-lld
Elf file type is EXEC (Executable file)
Entry point 0x201000
There are 10 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000200040 0x0000000000200040 0x000230 0x000230 R 0x8
INTERP 0x000270 0x0000000000200270 0x0000000000200270 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000200000 0x0000000000200000 0x000558 0x000558 R 0x1000
LOAD 0x001000 0x0000000000201000 0x0000000000201000 0x000185 0x000185 R E 0x1000
LOAD 0x002000 0x0000000000202000 0x0000000000202000 0x001170 0x002005 RW 0x1000
DYNAMIC 0x003010 0x0000000000203010 0x0000000000203010 0x000150 0x000150 RW 0x8
GNU_RELRO 0x003000 0x0000000000203000 0x0000000000203000 0x000170 0x001000 R 0x1
GNU_EH_FRAME 0x000440 0x0000000000200440 0x0000000000200440 0x000034 0x000034 R 0x1
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0
NOTE 0x00028c 0x000000000020028c 0x000000000020028c 0x000020 0x000020 R 0x4
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .rodata .dynsym .gnu.version .gnu.version_r .gnu.hash .hash .dynstr .rela.dyn .eh_frame_hdr .eh_frame
03 .text .init .fini
04 .data .tm_clone_table .fini_array .init_array .dynamic .got .bss
05 .dynamic
06 .fini_array .init_array .dynamic .got
07 .eh_frame_hdr
08
09 .note.ABI-tag
Not to be left behind, the newer BFD-ld (my version is 2.31.1) also makes ELF header and .rodata read-only, but fails to merge two R-only segments into one, resulting in 4 loadable segments:
$ echo "int foo; int main() { return 0;}" | clang -xc - -o a.out-bfd -fuse-ld=bfd
$ readelf -Wl a.out-bfd
Elf file type is EXEC (Executable file)
Entry point 0x401020
There are 11 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000400040 0x0000000000400040 0x000268 0x000268 R 0x8
INTERP 0x0002a8 0x00000000004002a8 0x00000000004002a8 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x0003f8 0x0003f8 R 0x1000
LOAD 0x001000 0x0000000000401000 0x0000000000401000 0x00018d 0x00018d R E 0x1000
LOAD 0x002000 0x0000000000402000 0x0000000000402000 0x000110 0x000110 R 0x1000
LOAD 0x002e40 0x0000000000403e40 0x0000000000403e40 0x0001e8 0x0001f0 RW 0x1000
DYNAMIC 0x002e50 0x0000000000403e50 0x0000000000403e50 0x0001a0 0x0001a0 RW 0x8
NOTE 0x0002c4 0x00000000004002c4 0x00000000004002c4 0x000020 0x000020 R 0x4
GNU_EH_FRAME 0x002004 0x0000000000402004 0x0000000000402004 0x000034 0x000034 R 0x4
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10
GNU_RELRO 0x002e40 0x0000000000403e40 0x0000000000403e40 0x0001c0 0x0001c0 R 0x1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn
03 .init .text .fini
04 .rodata .eh_frame_hdr .eh_frame
05 .init_array .fini_array .dynamic .got .got.plt .data .bss
06 .dynamic
07 .note.ABI-tag
08 .eh_frame_hdr
09
10 .init_array .fini_array .dynamic .got
Finally, some of these choices are affected by the --(no)rosegment (or -Wl,z,noseparate-code for BFD ld) linker option.
How is it possible to extract loadable program headers individually from ELF files?
By examining a binary using readelf one can get output similar to:
$ readelf -l helloworld
Elf file type is EXEC (Executable file)
Entry point 0x400440
There are 9 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000001f8 0x00000000000001f8 R E 8
INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x000000000000070c 0x000000000000070c R E 200000
LOAD 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
0x0000000000000230 0x0000000000000238 RW 200000
DYNAMIC 0x0000000000000e28 0x0000000000600e28 0x0000000000600e28
0x00000000000001d0 0x00000000000001d0 RW 8
NOTE 0x0000000000000254 0x0000000000400254 0x0000000000400254
0x0000000000000044 0x0000000000000044 R 4
GNU_EH_FRAME 0x00000000000005e4 0x00000000004005e4 0x00000000004005e4
0x0000000000000034 0x0000000000000034 R 4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 10
GNU_RELRO 0x0000000000000e10 0x0000000000600e10 0x0000000000600e10
0x00000000000001f0 0x00000000000001f0 R 1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
03 .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag .note.gnu.build-id
06 .eh_frame_hdr
07
08 .init_array .fini_array .jcr .dynamic .got
This question answers how loadable headers are being mapped to memory(and where) but does not specify from where(from which offset and size) are the sections read within the given binary.
Is it determined by the current program header's fields p_offset and p_filesz?
struct Proghdr {
uint32_t p_type;
uint32_t p_offset;
uint32_t p_va;
uint32_t p_pa;
uint32_t p_filesz;
uint32_t p_memsz;
uint32_t p_flags;
uint32_t p_align;
};
struct Elf *elf_header = ...
struct Proghdr *ph;
if (elf_header->e_magic != ELF_MAGIC)
goto bad;
ph = (struct Proghdr *) ((uint8_t *) elf_header + elf_header->e_phoff);
eph = ph + ELFHDR->e_phnum;
for (; ph < eph; ph++)
if(ph->p_type == PT_LOAD)
/*read_pload (dst address in memory, how many bytes to read, offset in the file) */
read_pload(ph->p_pa, ph->p_memsz, ph->p_offset);
Is it determined by the current program header's fields p_offset and p_filesz?
Yes, exactly.
get program header table address by reading e_phoff, header count (number of headers) by reading e_phnum and size of each header by reading e_phentsize from elf file header. the trick is that each header is of same size of e_phentsize. So after every e_phentsize, new header starts and headers for total e_phnum
I am running some C code, compiled to 32bit x86 on Linux. And I am trying to access some memory. Apperently I can write to .bss and .data and to the stack. Some time ago, the .ctors and .dtors segments used to be writable, but it seems they are gone.
Without trial-and-error, how can I found out to which sections in memory the segments are mapped? How can I find out which addresses are mapped to writable memory and which are executable?
Without trial-and-error, how can I found out to which sections in memory the segments are mapped?
Sections and segments have specific meaning when you talk about ELF executables, and your usage above doesn't agree with that meaning.
ELF sections don't matter at load time, only (loadable) segments do.
The readelf -l a.out command provides exactly the mapping from ELF sections to segments. E.g.
readelf -l /bin/date
Elf file type is EXEC (Executable file)
Entry point 0x8048c60
There are 6 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x000c0 0x000c0 R E 0x4
INTERP 0x0000f4 0x080480f4 0x080480f4 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0x05fe0 0x05fe0 R E 0x1000
LOAD 0x006000 0x0804e000 0x0804e000 0x00208 0x00334 RW 0x1000
DYNAMIC 0x006078 0x0804e078 0x0804e078 0x000c8 0x000c8 RW 0x4
NOTE 0x000108 0x08048108 0x08048108 0x00020 0x00020 R 0x4
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata
03 .data .eh_frame .dynamic .ctors .dtors .jcr .got .bss
04 .dynamic
05 .note.ABI-tag
This tells you that .ctors is mapped to segment 3, which is writable (this output is from ancient UnitedLinux 1.0 distribution).
Nowadays, .ctors is put into a segment different from .data, and protected from writing after relocation via special GNU_RELRO segment.
Look at /proc/$pid/maps (or use the pmap utility). It'll tell you more than you ever wanted to know about memory regions.