Output relocatable section data from linker script - linker

Using commands like BYTE or LONG, it is possible to include explicit bytes of data in an output section from a linker script. The linked page also describes that those commands can be used to output the value of symbols.
I would have expected that if you perform partial linking (i.e., using the -r option of ld), relocation records would be emitted for the symbols that are outputted in this way. However, it seems that the linker just outputs the currently known value1 of the symbol.
Here is a MWE to clarify what I mean.
test.c:
int foo = 1, bar = 2;
test.ld:
SECTIONS {
.data : {
*(.data)
LONG(foo)
LONG(bar)
}
}
Then run the following:
$ gcc -c test.c
$ ld -T test.ld -r -o test.elf test.o
$ readelf -r test.elf
There are no relocations in this file.
$ readelf -x .data test.elf
Hex dump of section '.data':
0x00000000 01000000 02000000 00000000 04000000 ................
As you can see, no relocations are created and the values that are outputted are the currently known values of foo and bar.
Could this be a bug? If not, is there any way to force the linker to output relocation records for symbols added to an output section?
1 I'm not sure of this is the correct term. What I mean is the value that you see when you run readelf -s on the input object file.

Related

Find all symbols in a directory

I am looking to figure out which C library to include when compiling a program that includes it as a header, in this case #include <pcre2.h>. The only way I've been able to figure out where the file is I need is to check for a specific symbol that I know needs to be exported. For example:
$ ls
CMakeCache.txt Makefile install_manifest.txt libpcre2-posix.pc pcre2_grep_test.sh
CMakeFiles a.out libpcre2-8.a pcre2-config pcre2_test.sh
CTestCustom.ctest cmake_install.cmake libpcre2-8.pc pcre2.h pcre2grep
CTestTestfile.cmake config.h libpcre2-posix.a pcre2_chartables.c pcre2test
$ objdump -t libpcre2-8.a|grep pcre2_compile
pcre2_compile.c.o: file format elf64-x86-64
0000000000000000 l df *ABS* 0000000000000000 pcre2_compile.c
00000000000100bc g F .text 00000000000019dd pcre2_compile_8
0000000000000172 g F .text 00000000000000e3 pcre2_compile_context_create_8
0000000000000426 g F .text 0000000000000055 pcre2_compile_context_copy_8
0000000000000557 g F .text 0000000000000032 pcre2_compile_context_free_8
And because the symbol pcre2_compile_8 exists in that file (after trying every other file...) I know that the library I need to include is pcre2-8, that is, I compile my code with:
$ gcc myfile.c -lpcre2-8 -o myfile; ./myfile
Two questions related to this:
Is there a simpler way to find a symbols in a batch of files (some of which are not elf files)? For example, something like objdump -t *? Or what's the closest thing to doing that?
Is there a better way to find out what the library value of -l<library> is? Or, what's the common way when someone downloads a new C program that they know what to add to their command-line so that the program works? (For me, I've just spent the last hour figuring out that it's -lpcre2-8 and not -lpcre or -lpcre2.
Usually, the function you call from the library will be a symbol defined by that library. But in PCRE2, due to different code unit sizes, the function you call (e.g. pcre2_compile) actually becomes a different symbol through preprocessor macros (e.g. pcre2_compile_8). You can find the symbol you need from the library by compiling your program and checking the undefined symbols:
$ cat test.c
#define PCRE2_CODE_UNIT_WIDTH 8
#include <pcre2.h>
int main() {
pcre2_compile("",0,0,NULL,NULL,NULL);
}
$ gcc -c test.c
$ nm -u test.o
U _GLOBAL_OFFSET_TABLE_
U pcre2_compile_8
Is there a simpler way to find a symbols in a batch of files?
You can search a directory (/usr/lib/ below) for the library files (.a or .so extension below), running nm for each and search for the undefined symbol (adapted from this question):
$ for lib in $(find /usr/lib/ -name \*.a -o -name \*.so)
> do
> nm -A --defined-only $lib 2>/dev/null| grep pcre2_compile_8
> done
/usr/lib/x86_64-linux-gnu/libpcre2-8.a:libpcre2_8_la-pcre2_compile.o:0000000000007f40 T pcre2_compile_8
Is there a better way to find out what the library value of -l is?
It is usually conveyed through the library documentation. For PCRE2, the second page of the documentation talks about the pcre-config tool that gives the appropriate flags:
pcre2-config returns the configuration of the installed PCRE2 libraries and the options required to compile a program to use them. Some of the options apply only to the 8-bit, or 16-bit, or 32-bit libraries, respectively, and are not available for libraries that have not been built.
[...]
--libs8 Writes to the standard output the command line options required to link with the 8-bit PCRE2 library (-lpcre2-8 on many systems).
[...]
--cflags Writes to the standard output the command line options required to compile files that use PCRE2 (this may include some -I options, but is blank on many systems).
So for this particular library, the recommended way to build and link is:
gcc -c $(pcre2-config --cflags) test.c -o test.o
gcc test.o -o test $(pcre2-config --libs8)

Master Boot Record using GNU Assembly: extra bytes in flat binary output

I am try to compile the simple following MBR:
.code16
.globl _start
.text
_start:
end:
jmp end
; Don't bother with 0xAA55 yet
I run the following commands:
> as --32 -o boot.o boot.s
> ld -m elf_i386 boot.o --oformat=binary -o mbr -Ttext 0x7c00
However, I get a binary file of more than 129MB which is strange to me. Thus,
I wanted to know what is going on in that build process ? Thank you very much.
Running objdump over boot.o give me:
> objdump -s boot.o
boot.o: format de fichier elf32-i386
Contenu de la section .text :
0000 ebfe ..
Contenu de la section .note.gnu.property :
0000 04000000 18000000 05000000 474e5500 ............GNU.
0010 020001c0 04000000 00000000 010001c0 ................
0020 04000000 01000000
Manually removing the section .note.gnu.property before calling ld seems to solve the problem. However, I don't know why this section appears by default... Running the following build commands seems to solve the problem too:
> as --32 -o boot.o boot.s -mx86-used-note=no
> ld -m elf_i386 boot.o --oformat=binary -o mbr -Ttext 0x7c00
ld links all your sections into the flat binary output unless you tell it not to (with a linker script for example).
The extra bytes are from the .note.gnu.property section which as adds, which can indicate stuff like x86 ISA version (e.g. AVX2+FMA+BMI2, Haswell feature level, is x86-64_v3.) You don't want that in your flat binary, especially not at the default high address far from where you tell it to put your .text section with -Ttext; that would result in a huge file with zeros padding the gap since it's a flat binary.
Using as -mx86-used-note=no will omit that section from the .o in the first place, leaving only the sections you define in your asm source. From the GAS manual's i386 options
-mx86-used-note=no
-mx86-used-note=yes
These options control whether the assembler should generate GNU_PROPERTY_X86_ISA_1_USED and GNU_PROPERTY_X86_FEATURE_2_USED GNU
property notes. The default can be controlled by the
--enable-x86-used-note configure option.
using -mx86-used-note=no flag with as will remove note section.
Check here https://sourceware.org/binutils/docs/as/i386_002dOptions.html
-mx86-used-note=no
-mx86-used-note=yes
These options control whether the assembler should generate GNU_PROPERTY_X86_ISA_1_USED and GNU_PROPERTY_X86_FEATURE_2_USED GNU
property notes. The default can be controlled by the
--enable-x86-used-note configure option.

What is the difference between executable files?

I have the following C program:
#include<stdio.h>
int main()
{
printf("hhhh");
return 0;
}
Commands to compile, copy and compare:
$ gcc print.c -o a.out
$ objcopy a.out b.out
$ cmp a.out b.out
I have compiled this program and created an executable. Then, I have used the objcopy command to make a copy of the executable. But, when I compare these files, I get this:
files differ: byte 41, line 1
How can I know what contents are missing?
Any help or pointers would be appreciated. Thanks!
How can I know what contents are missing?
What made you believe that any contents is missing?
The way objcopy works is:
parse the contents of the input file into internal representation.
copy parts of the original file to the output file as instructed by options
Nowhere does objdump guarantee that when "copy entire file" is given, the result will be bit-identical.
In particular, non-loadable sections could be reordered or changed in other ways.
The difference is EntSize of .init_array section is 0 bytes in a.out file and it is 8 bytes in the b.out
The EntSize of 0 doesn't make sense for a non-empty section. If you really have such section in your a.out, it's likely that your linker has a bug.

How to make objdump show assembly of sections only appeared in source code?

I would like to produce assemblies like the one in the answer of this question Using GCC to produce readable assembly?
for simple test code: test.c
void main(){
int i;
for(i=0;i<10;i++){
printf("%d\n",i);
}
}
gcc command : gcc -g test.c -o test.o
objdump command: objdump -d -M intel -S test.o
But what i got is assemblies starts with .init section
080482bc<_init>: and end with .fini section 080484cc<_fini>
which i do not want them to be shown.
why is this happening ? and how can i avoid showing sections that are not in the source file?
Right now you're creating an executable file and not an object file. The executable file of course contains lot of extra sections.
If you want to create an object file, use the -c flag to GCC.
You can specify sections using -j option.
So objdump -d executable -j .text -j .plt will only show disassembly from .text and .plt sections.

Is there an option to GNU ld to omit -dynamic-linker (PT_INTERP) completely?

I'm experimenting with the concept of pure-static-linked PIE executables on Linux, but running into the problem that the GNU binutils linker insists on adding a PT_INTERP header to the output binary when -pie is used, even when also given -static. Is there any way to inhibit this behavior? That is, is there a way to tell GNU ld specifically not to write certain headers to the output file? Perhaps with a linker script?
(Please don't answer with claims that it won't work; I'm well aware that the program still needs relocation processing - load-address-relative relocations only due to my use of -Bsymbolic - and I have special startup code in place of the standard Scrt1.o to handle this. But I can't get it to be invoked without the dynamic linker already kicking in and doing the work unless hexedit the PT_INTERP header out of the binary.)
Maybe I'm being naïve, but... woudn't suffice to search for the default linker script, edit it, and remove the line that links in the .interp section?
For example, in my machine the scripts are in /usr/lib/ldscripts and the line in question is interp : { *(.interp) } in the SECTIONS section.
You can dumpp the default script used running the following command:
$ ld --verbose ${YOUR_LD_FLAGS} | \
gawk 'BEGIN { s = 0 } { if ($0 ~ /^=/) s = !s; else if (s == 1) print; }'
You can modify the gawk script slightly to remove the interp line (or just use grep -v and use that script to link your program.
I think I might have found a solution: simply using -shared instead of -pie to make pie binaries. You need a few extra linker options to patch up the behavior, but it seems to avoid the need for a custom linker script. Or in other words, the -shared linker script is already essentially correct for linking static pie binaries.
If I get it working with this, I'll update the answer with the exact command line I'm using.
Update: It works! Here's the command line:
gcc -shared -static-libgcc -Wl,-static -Wl,-Bsymbolic \
-nostartfiles -fPIE Zcrt1.s Zcrt2.c /usr/lib/crti.o hello.c /usr/lib/crtn.o
where Zcrt1.s is a modified version of Scrt1.s that calls a function in Zcrt2.c before doing its normal work, and the code in Zcrt2.c processes the aux vector just past the argv and environment arrays to find the DYNAMIC section, then loops over the relocation tables and applies all the relative-type relocations (the only ones that should exist).
Now all of this can (with a little work) be wrapped up into a script or gcc specfile...
Expanding on my earlier note as this doesn't fit in that puny box (and this is just as an idea or discussion, please do not feel obligated to accept or reward bounty), perhaps the easiest and cleanest way of doing this is to juts add a post-build step to strip the PT_INTERP header from the resulting binary?
Even easier than manually editing the headers and potentially having to shift everything around is to just replace PT_INTERP with PT_NULL. I don't know whether you can find a way of simply patching the file via existing tools (some sort of scriptable hex find and replace) or if you'll have to write a small program to do that. I do know that libbfd (the GNU Binary File Descriptor library) might be your friend in the latter case, as it'll make that entire business a lot easier.
I guess I just don't understand why it's important to have this performed via an ld option. If available, I can see why it would be preferable; but as some (admittedly light) Googling indicates there isn't such a feature, it might be less of a hassle to just do it separately and after-the-fact. (Perhaps adding the flag to ld is easier than scripting the replacement of PT_INTERP with PT_NULL, but convincing the devs to pull it upstream is a different matter.)
Apparently (and please correct me if this is something you've already seen) you can override the behavior of ld with regards to any of the ELF headers in your linker script with the PHDRS command, and using :none to specify that a particular header type should not be included in any segment. I'm not certain of the syntax, but I presume it would look something like this:
PHDRS
{
headers PT_PHDR PHDRS ;
interp PT_INTERP ;
text PT_LOAD FILEHDR PHDRS ;
data PT_LOAD ;
dynamic PT_DYNAMIC ;
}
SECTIONS
{
. = SIZEOF_HEADERS;
.interp : { } :none
...
}
From the ld docs you can override the linker script with --library-path:
--library-path=searchdir
Add path searchdir to the list of paths that ld will search for
archive libraries and ld control scripts. You may use this option any
number of times. The directories are searched in the order in which
they are specified on the command line. Directories specified on the
command line are searched before the default directories. All -L
options apply to all -l options, regardless of the order in which the
options appear. The default set of paths searched (without being
specified with `-L') depends on which emulation mode ld is using, and
in some cases also on how it was configured. See section Environment
Variables. The paths can also be specified in a link script with the
SEARCH_DIR command. Directories specified this way are searched at the
point in which the linker script appears in the command line.
Also, from the section on Implicit Linker Scripts:
If you specify a linker input file which the linker can not recognize
as an object file or an archive file, it will try to read the file as
a linker script. If the file can not be parsed as a linker script, the
linker will report an error.
Which would seem to imply values in user-defined linker scripts, in contrast with implicitly defined linker scripts, will replace values in the default scripts.
I'am not an expert in GNU ld, but I have found the following information in the documentation:
The special secname `/DISCARD/' may be used to discard input sections.
Any sections which are assigned to an output section named `/DISCARD/'
are not included in the final link output.
I hope this will help you.
UPDATE:
(This is the first version of the solution, which don't work because INTERP section is dropped along with the header PT_INTERP.)
main.c:
int main(int argc, char **argv)
{
return 0;
}
main.x:
SECTIONS {
/DISCARD/ : { *(.interp) }
}
build command:
$ gcc -nostdlib -pie -static -Wl,-T,main.x main.c
$ readelf -S a.out | grep .interp
build command without option -Wl,-T,main.x:
$ gcc -nostdlib -pie -static main.c
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000000218
$ readelf -S a.out | grep .interp
[ 1] .interp PROGBITS 00000134 000134 000013 00 A 0 0 1
UPDATE 2:
The idea of this solution is that the original section 'INTERP' (. interp in the linker script file) is renamed to .interp1. In other words, the entire contents of the section is placed to the .interp1 section. Therefore, we can safe remove INTERP section (now empty) without fear of losing default linker script settings and hence the header INTERP_PT will be removed too.
SECTIONS {
.interp1 : { *(.interp); } : NONE
/DISCARD/ : { *(.interp) }
}
In order to show that the contents of the section INTERP present in the file (as .interp1), but INTERP_PT header removed, I use a combination of readelf + grep.
$ gcc -nostdlib -pie -Wl,-T,main.x main.c
$ readelf -l a.out | grep interp
00 .note.gnu.build-id .text .interp1 .dynstr .hash .gnu.hash .dynamic .got.plt
$ readelf -S a.out | grep interp
[ 3] .interp1 PROGBITS 0000002e 00102e 000013 00 A 0 0 1
The option -Wl,--no-dynamic-linker solves the issues with binutils 2.26 or later.

Resources