What is *ABS* section and when to use? - c

// foo.c
int main() { return 0; }
When I compiled the code above I noticed some symbols located in *ABS*:
$ gcc foo.c
$ objdump -t a.out | grep ABS
0000000000000000 l df *ABS* 0000000000000000 crtstuff.c
0000000000000000 l df *ABS* 0000000000000000 foo.c
0000000000000000 l df *ABS* 0000000000000000 crtstuff.c
0000000000000000 l df *ABS* 0000000000000000
Looks like they're some debug symbols but isn't debug info are stored in somewhere like .debug_info section?
According to man objdump:
*ABS* if the section is absolute (ie not connected with any section)
I don't understand it since no example is given here.
Question here shows an interesting way to pass some extra symbols in *ABS* by --defsym. But I think it could be easier by passing macros.
So what is this *ABS* section and when would someone use it?
EDIT:
Absolute symbols don't get relocated, their virtual addresses (0000000000000000 in the example you gave) are fixed.
I wrote a demo but it seems that the addresses of absolute symbols can be modified.
// foo.c
#include <stdio.h>
extern char foo;
int main()
{
printf("%p\n", &foo);
return 0;
}
$ gcc foo.c -Wl,--defsym,foo=0xbeef -g
$ objdump -t a.out | grep ABS
0000000000000000 l df *ABS* 0000000000000000 crtstuff.c
0000000000000000 l df *ABS* 0000000000000000 foo.c
0000000000000000 l df *ABS* 0000000000000000 crtstuff.c
0000000000000000 l df *ABS* 0000000000000000
000000000000beef g *ABS* 0000000000000000 foo
# the addresses are not fixed
$ ./a.out
0x556e06629eef
$ ./a.out
0x564f0d7aeeef
$ ./a.out
0x55c2608dceef
# gdb shows that before entering main(), &foo == 0xbeef
$ gdb a.out
(gdb) p &foo
$1 = 0xbeef <error: Cannot access memory at address 0xbeef>
(gdb) br main
Breakpoint 1 at 0x6b4: file foo.c, line 7.
(gdb) r
Starting program: /home/user/a.out
Breakpoint 1, main () at foo.c:7
7 printf("%p", &foo);
(gdb) p &foo
$2 = 0x55555555feef <error: Cannot access memory at address 0x55555555feef>

If you look at other symbols you might find an index (or section name if the reader does the mapping for you) in place of *ABS*. This is a section index in the section headers table. It points to the section header of a section the symbol is defined in (or SHN_UNDEF (zero) if it is undefined in the object you are looking at). So the value (virtual address) of a symbol will be adjusted by the same value its containing section is adjusted during loading. (This process is called relocation.) Not so for absolute symbols (having special value SHN_ABS as their st_shndx). Absolute symbols don't get relocated, their virtual addresses (0000000000000000 in the example you gave) are fixed.
Such absolute symbols are sometimes used to store some meta information. In particular, the compiler can create symbols with symbol names equivalent to the names of translation units it compiles. Such symbols aren't needed for linking or running the program, they are just for humans and binary processing tools.
As for your question w.r.t the reason this isn't stored in .debug_info section (and why this info is emitted even though no debug switches were specified), the answer is that it is a separate thing; it is just the symbol table (.symtab). It is also needed for debugging, sure, but it's primary purpose is linking of object (.o) files. By default it is preserved in linked executables/libraries. You can get rid of it with strip.
Much of what I wrote here is in man 5 elf.
I don't think doing what you are doing (with --defsym) is supported/supposed to work with dynamic linking. Looking at the compiler output (gcc -S -masm=intel), I see this
lea rsi, foo[rip]
Or, if we look at objdump -M intel -rD a.out (linking with -q to preserve relocations), we see the same thing: rip-relative addressing is used to get the address of foo.
113d: 48 8d 35 ab ad 00 00 lea rsi,[rip+0xadab] # beef <foo>
1140: R_X86_64_PC32 foo-0x4
The compiler doesn't know that it's going to be an absolute symbol, so it produces the code it does (as for a normal symbol). rip is the instruction pointer, so it depends on the base address of the segment containing .text after the program is mapped into memory by ld.so.
I found this answer shedding light on the proper use-case for absolute symbols.

Related

How to link file generated with --relocatable in a PIE executable?

I have a big text file that I want to include in a C program. I could just make it a string literal but it's pretty big and that would be cumbersome. So I'm currently linking like this:
$ ld -r -b binary -o /tmp/stuff.o /tmp/stuff.txt
$ clang -o myprogram main.o /tmp/stuff.o
Objdump output:
$ objdump -t /tmp/stuff.o
/tmp/stuff.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000006 g *ABS* 0000000000000000 _binary__tmp_stuff_txt_size
0000000000000006 g .data 0000000000000000 _binary__tmp_stuff_txt_end
0000000000000000 g .data 0000000000000000 _binary__tmp_stuff_txt_start
In the code, I do this (gotten from this question):
extern char _binary__tmp_stuff_txt_start[];
extern char _binary__tmp_stuff_txt_size[];
int f(void) {
size_t size = (size_t)_binary__tmp_stuff_txt_size;
do_stuff(size, _binary__tmp_stuff_txt_start);
}
Everything works great, but when I compile with GCC instead of Clang, it segfaults. Looking at it in GDB, the size variable initialized like this size_t size = (size_t)_binary__tmp_stuff_txt_size; is garbage. It seems that when GCC links, it passes the -pie flag to ld but Clang doesn't. I could fix this by just passing -no-pie to GCC, but it seems kindof sad that doing something so simple would prevent using PIE. Is there something I should change to make this work?

gcc weak attribute inconsistent behaviour

I am using gcc compiler in windows10's powershell. gcc came with the Atollic TrueSTUDIO ide. The reason I am doing this is to be able to create an .exe file from the C code so unit testing becomes easier.
I encounter a linker error (undefined reference to 'function_name') when there is a function that is defined as weak and that function is used in another .c file.
Meanwhile I do not get this linker error if I use arm-atollic-eabi-gcc or gcc running on ubuntu.
Here is a simple code to demonstrate this:
hello.c:
#include "weak.h"
void whatever(void)
{
iamweak();
}
weak.c:
#include <stdio.h>
#include "weak.h"
void __attribute__((weak)) iamweak(void)
{
printf("i am weak...\n");
}
weak.h
void iamweak(void);
main.c
int main(void)
{
return 0;
}
Creating the object files and linking:
> gcc -c main.c weak.c hello.c
> gcc -o main.exe main.o weak.o hello.o
> hello.o:hello.c:(.text+0x7): undefined reference to `iamweak'
collect2.exe: error: ld returned 1 exit status
Now I checked with gcc-nm the symbol table of hello.o:
> gcc-nm hello.o
00000000 b .bss
00000000 d .data
00000000 r .eh_frame
00000000 r .rdata$zzz
00000000 t .text
U _iamweak
00000000 T _whatever
Symbol table for weak.o:
>gcc-nm weak.o
00000000 b .bss
00000000 d .data
00000000 r .eh_frame
00000000 r .rdata
00000000 r .rdata$zzz
00000000 t .text
00000000 T .weak._iamweak.
w _iamweak
U _puts
Now when I use gcc on Ubuntu as I said everything works. Also the symbol tables are a little different.
Symbol table for hello.o:
nm hello.o
U _GLOBAL_OFFSET_TABLE_
U iamweak
0000000000000000 T whatever
Symbol table for weak.o:
nm weak.o
U _GLOBAL_OFFSET_TABLE_
0000000000000000 W iamweak
U puts
From https://linux.die.net/man/1/nm it says that "If lowercase, the symbol is local; if uppercase, the symbol is global (external)."
So iamweak is local in windows10 and global in Ubuntu. Is that why the linker cannot see it? What can I do about this? The weak function definitions are also in some HAL libraries and I don't want to modify those. Is there a workaround?
it is atollic gcc fork error. It does even worse:
main:
00401440: push %ebp
00401441: mov %esp,%ebp
00401443: and $0xfffffff0,%esp
00401446: call 0x401970 <__main>
36 iamweak();
0040144b: call 0x0
37 return 0;
00401450: mov $0x0,%eax
38 }
the complete atollic studio project here

Generating both a library and binary from a single C source

I have a C source file with a main function and some other functions. Something like:
#include "stdlib.h"
int program(int argc, char ** argv)
{
int a = atoi(argv[1]);
int b = atoi(argv[2]);
return a + b;
}
int main(int argc, char ** argv)
{
return program(argc, argv);
}
I know how to compile this to produce a binary.
Is there a way to compile this into an object file with the main symbol/function omitted?
I understand that I could accomplish my goal by splitting main into its own file, but suppose I don't want to do that.
Usually, having a definition of main() in a library is not a problem because the linker would only use it if there were no main() in any non-library binary. That can even be used to advantage, to include a default main(). See, for example, the Posix standard -ll library used with lex (or -lfl if you use flex).
If you really want to ensure that the symbol is not available for resolution, you can remove the symbol from the library. There are tools for manipulating binary files, which vary from system to system. For example, take a look at the --strip-symbol option of objcopy. (That doesn't remove the compiled code; it just makes it unresolvable.)
A library is simply an archive of object modules - to omit main() it must either be in a separate object module which you then simply omit from the library build, or you use conditional compilation so that it is omitted at compile time.
In fact if main were in a separate object module it would not matter whether it were not omitted since any definition in a directly linked object module would override any static library definition, so the library definition would only be used if it were not redefined. I am not sure whether this will work if main() is defined in a module containing other symbols that are referenced in the binary, but nothing bad will happen if you try it other than a duplicate symbol error.
Is there a way to compile this into an object file with the main
symbol/function omitted?
So you don't want symbol main in your object files.
This might be one way.
file.c
#include "stdlib.h"
int program(int argc, char ** argv)
{
int a = atoi(argv[1]);
int b = atoi(argv[2]);
return a + b;
}
int not_main(int argc, char ** argv)
{
exit(0);
}
and then compile
[gcc]
gcc file.c -o file -e not_main -nostartfiles
SYMBOL TABLE:
0000000000000238 l d .interp 0000000000000000 .interp
0000000000000254 l d .note.gnu.build-id 0000000000000000 .note.gnu.build-id
0000000000000278 l d .gnu.hash 0000000000000000 .gnu.hash
0000000000000298 l d .dynsym 0000000000000000 .dynsym
00000000000002e0 l d .dynstr 0000000000000000 .dynstr
0000000000000302 l d .gnu.version 0000000000000000 .gnu.version
0000000000000308 l d .gnu.version_r 0000000000000000 .gnu.version_r
0000000000000328 l d .rela.plt 0000000000000000 .rela.plt
0000000000000360 l d .plt 0000000000000000 .plt
0000000000000390 l d .text 0000000000000000 .text
00000000000003f0 l d .eh_frame_hdr 0000000000000000 .eh_frame_hdr
0000000000000418 l d .eh_frame 0000000000000000 .eh_frame
0000000000200e78 l d .dynamic 0000000000000000 .dynamic
0000000000200fd8 l d .got 0000000000000000 .got
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 l df *ABS* 0000000000000000 file.c
0000000000000000 l df *ABS* 0000000000000000
0000000000200e78 l O .dynamic 0000000000000000 _DYNAMIC
00000000000003f0 l .eh_frame_hdr 0000000000000000 __GNU_EH_FRAME_HDR
0000000000200fd8 l O .got 0000000000000000 _GLOBAL_OFFSET_TABLE_
0000000000201000 g .got 0000000000000000 _edata
00000000000003d5 g F .text 0000000000000019 not_main
0000000000000390 g F .text 0000000000000045 program
0000000000201000 g .got 0000000000000000 _end
0000000000201000 g .got 0000000000000000 __bss_start
0000000000000000 F *UND* 0000000000000000 atoi##GLIBC_2.2.5
0000000000000000 F *UND* 0000000000000000 exit##GLIBC_2.2.5
Is there a way to compile this into an object file with the main symbol/function omitted?
Yes, by using preprocessor tricks and/or preprocessor options to the compiler.
Change your C code (in your file mycode.c) to contain:
#ifdef HAVE_MAIN
int main(int argc, char ** argv)
{
return program(argc, argv);
}
#endif
Then, to get only an object file mycode.o, compile as gcc -Wall -Wextra -g mycode.c -c -o mycode.o (if using GCC)
To get the entire program myprog, compile it as gcc -Wall -Wextra -g -DHAVE_MAIN mycode.c -o myprog
You could (avoiding any #ifdef HAVE_MAIN) even compile with gcc -Wall -Wextra -g -Dmain=mymain -c mycode.c to get the main function renamed, by preprocessing, as mymain (and then it is losing its magical status of "entry point").
However, doing that is considered bad taste (not very readable code). You'll better put your main in a different translation unit and compile it only when you want a whole program. And quite often, a library (or an executable) is made from several translation units (each compiled into some object file; the set of object files gets linked together). You'll practically use some build automation tool (e.g. make or ninja, etc...) to build it.

Entry point address of a PIE program

How do I know the actual entry point address of a PIE program on Linux/Android?
I can read the entry point address using readelf -l, but for a elf compiled and linked with -pie or -fPIE, the actual entry point address will be different from it. How can I get such address at run time? That is, knowing where the program is loaded into memory.
The entry point of a program is always available to it as the address of
the symbol _start.
main.c
#include <stdio.h>
extern char _start;
int main()
{
printf("&_start = %p\n",&_start);
return 0;
}
Compile and link -no-pie:
$ gcc -no-pie main.c
Then we see:
$ nm a.out | grep '_start'
0000000000601030 B __bss_start
0000000000601020 D __data_start
0000000000601020 W data_start
w __gmon_start__
0000000000600e10 t __init_array_start
U __libc_start_main##GLIBC_2.2.5
0000000000400400 T _start
^^^^^^^^^^^^^^^
and:
$ readelf -h a.out | grep Entry
Entry point address: 0x400400
and:
$ ./a.out
&_start = 0x400400
Compile and link -pie:
$ gcc -pie main.c
Then we see:
$ nm a.out | grep '_start'
0000000000201010 B __bss_start
0000000000201000 D __data_start
0000000000201000 W data_start
w __gmon_start__
0000000000200db8 t __init_array_start
U __libc_start_main##GLIBC_2.2.5
0000000000000540 T _start
^^^^^^^^^^^^
and:
$ readelf -h a.out | grep Entry
Entry point address: 0x540
and:
$ ./a.out
&_start = 0x560a8dc5e540
^^^
So the PIE program is entered at its nominal entry point 0x540 plus 0x560a8dc5e000.

Is extern optional?

I am sure I am going crazy, but consider the following C code:
// file1.c
int first;
void f(void)
{ first = 2; }
// file2.c
#include <stdio.h>
int first;
void f();
int main(void)
{
first = 1;
f();
printf("%d", first);
}
These two files, for some reason will compile and link together, and print 2. I was always under the impression that unless I labelled one or the other (but not both) definitions of first with extern, this wouldn't compile, and that was in fact the whole point of extern!
It only compiles because first is only declared twice, there are not actually two places in memory but only one. Just initialize the one first with int first=4; and the other with int first=5; and your linker will show you the error, e.g. GCC:
b.o:b.c:(.data+0x0): multiple definition of `_first'
a.o:a.c:(.data+0x0): first defined here
collect2.exe: error: ld returned 1 exit status
Under normal conditions (no extra gcc flags) you should be fine to compile this code as:
gcc file1.c file2.c
What's going to happen is the compiler will see that you have two global variables named the same thing and neither is initialized. Then it will place your uninitialized global variables in the "common" section of the code**. In other words it's going to have only 1 copy of the "first" variable. This happens because the default for gcc is -fcommon
If you were to compile with the -fno-common flag you'd now receive the error you were thinking of:
/tmp/ccZNeN8c.o:(.bss+0x0): multiple definition of `first'
/tmp/cc09s2r7.o:(.bss+0x0): first defined here
collect2: ld returned 1 exit status
To resolve this you'd add extern to all but one of the variables.
WARNING:
Now let's say you had two global uninitialized arrays of different sizes:
// file1.c
int first[10];
// file2.c
int first[20];
Well guess what, compiling them with gcc -Wall file1.c file2.c produces no warnings or errors and the variable was made common even though it's differently sized!!!
//objdump from file1.c:
0000000000000028 O *COM* 0000000000000020 first
//objdump from file2.c:
0000000000000050 O *COM* 0000000000000020 first
This is one of the dangers of global variables.
**If you look at an objdump of the *.o files (you have to compile with gcc -c to generate them) you'll see first placed in the common (*COM*) section:
mike#mike-VirtualBox:~/C$ objdump -t file2.o
a.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 file2.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .rodata 0000000000000000 .rodata
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .eh_frame 0000000000000000 .eh_frame
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000004 O *COM* 0000000000000004 first
0000000000000000 g F .text 0000000000000039 main
0000000000000000 *UND* 0000000000000000 f
0000000000000000 *UND* 0000000000000000 printf

Resources