How do I know the actual entry point address of a PIE program on Linux/Android?
I can read the entry point address using readelf -l, but for a elf compiled and linked with -pie or -fPIE, the actual entry point address will be different from it. How can I get such address at run time? That is, knowing where the program is loaded into memory.
The entry point of a program is always available to it as the address of
the symbol _start.
main.c
#include <stdio.h>
extern char _start;
int main()
{
printf("&_start = %p\n",&_start);
return 0;
}
Compile and link -no-pie:
$ gcc -no-pie main.c
Then we see:
$ nm a.out | grep '_start'
0000000000601030 B __bss_start
0000000000601020 D __data_start
0000000000601020 W data_start
w __gmon_start__
0000000000600e10 t __init_array_start
U __libc_start_main##GLIBC_2.2.5
0000000000400400 T _start
^^^^^^^^^^^^^^^
and:
$ readelf -h a.out | grep Entry
Entry point address: 0x400400
and:
$ ./a.out
&_start = 0x400400
Compile and link -pie:
$ gcc -pie main.c
Then we see:
$ nm a.out | grep '_start'
0000000000201010 B __bss_start
0000000000201000 D __data_start
0000000000201000 W data_start
w __gmon_start__
0000000000200db8 t __init_array_start
U __libc_start_main##GLIBC_2.2.5
0000000000000540 T _start
^^^^^^^^^^^^
and:
$ readelf -h a.out | grep Entry
Entry point address: 0x540
and:
$ ./a.out
&_start = 0x560a8dc5e540
^^^
So the PIE program is entered at its nominal entry point 0x540 plus 0x560a8dc5e000.
Related
I have a big text file that I want to include in a C program. I could just make it a string literal but it's pretty big and that would be cumbersome. So I'm currently linking like this:
$ ld -r -b binary -o /tmp/stuff.o /tmp/stuff.txt
$ clang -o myprogram main.o /tmp/stuff.o
Objdump output:
$ objdump -t /tmp/stuff.o
/tmp/stuff.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000006 g *ABS* 0000000000000000 _binary__tmp_stuff_txt_size
0000000000000006 g .data 0000000000000000 _binary__tmp_stuff_txt_end
0000000000000000 g .data 0000000000000000 _binary__tmp_stuff_txt_start
In the code, I do this (gotten from this question):
extern char _binary__tmp_stuff_txt_start[];
extern char _binary__tmp_stuff_txt_size[];
int f(void) {
size_t size = (size_t)_binary__tmp_stuff_txt_size;
do_stuff(size, _binary__tmp_stuff_txt_start);
}
Everything works great, but when I compile with GCC instead of Clang, it segfaults. Looking at it in GDB, the size variable initialized like this size_t size = (size_t)_binary__tmp_stuff_txt_size; is garbage. It seems that when GCC links, it passes the -pie flag to ld but Clang doesn't. I could fix this by just passing -no-pie to GCC, but it seems kindof sad that doing something so simple would prevent using PIE. Is there something I should change to make this work?
// foo.c
int main() { return 0; }
When I compiled the code above I noticed some symbols located in *ABS*:
$ gcc foo.c
$ objdump -t a.out | grep ABS
0000000000000000 l df *ABS* 0000000000000000 crtstuff.c
0000000000000000 l df *ABS* 0000000000000000 foo.c
0000000000000000 l df *ABS* 0000000000000000 crtstuff.c
0000000000000000 l df *ABS* 0000000000000000
Looks like they're some debug symbols but isn't debug info are stored in somewhere like .debug_info section?
According to man objdump:
*ABS* if the section is absolute (ie not connected with any section)
I don't understand it since no example is given here.
Question here shows an interesting way to pass some extra symbols in *ABS* by --defsym. But I think it could be easier by passing macros.
So what is this *ABS* section and when would someone use it?
EDIT:
Absolute symbols don't get relocated, their virtual addresses (0000000000000000 in the example you gave) are fixed.
I wrote a demo but it seems that the addresses of absolute symbols can be modified.
// foo.c
#include <stdio.h>
extern char foo;
int main()
{
printf("%p\n", &foo);
return 0;
}
$ gcc foo.c -Wl,--defsym,foo=0xbeef -g
$ objdump -t a.out | grep ABS
0000000000000000 l df *ABS* 0000000000000000 crtstuff.c
0000000000000000 l df *ABS* 0000000000000000 foo.c
0000000000000000 l df *ABS* 0000000000000000 crtstuff.c
0000000000000000 l df *ABS* 0000000000000000
000000000000beef g *ABS* 0000000000000000 foo
# the addresses are not fixed
$ ./a.out
0x556e06629eef
$ ./a.out
0x564f0d7aeeef
$ ./a.out
0x55c2608dceef
# gdb shows that before entering main(), &foo == 0xbeef
$ gdb a.out
(gdb) p &foo
$1 = 0xbeef <error: Cannot access memory at address 0xbeef>
(gdb) br main
Breakpoint 1 at 0x6b4: file foo.c, line 7.
(gdb) r
Starting program: /home/user/a.out
Breakpoint 1, main () at foo.c:7
7 printf("%p", &foo);
(gdb) p &foo
$2 = 0x55555555feef <error: Cannot access memory at address 0x55555555feef>
If you look at other symbols you might find an index (or section name if the reader does the mapping for you) in place of *ABS*. This is a section index in the section headers table. It points to the section header of a section the symbol is defined in (or SHN_UNDEF (zero) if it is undefined in the object you are looking at). So the value (virtual address) of a symbol will be adjusted by the same value its containing section is adjusted during loading. (This process is called relocation.) Not so for absolute symbols (having special value SHN_ABS as their st_shndx). Absolute symbols don't get relocated, their virtual addresses (0000000000000000 in the example you gave) are fixed.
Such absolute symbols are sometimes used to store some meta information. In particular, the compiler can create symbols with symbol names equivalent to the names of translation units it compiles. Such symbols aren't needed for linking or running the program, they are just for humans and binary processing tools.
As for your question w.r.t the reason this isn't stored in .debug_info section (and why this info is emitted even though no debug switches were specified), the answer is that it is a separate thing; it is just the symbol table (.symtab). It is also needed for debugging, sure, but it's primary purpose is linking of object (.o) files. By default it is preserved in linked executables/libraries. You can get rid of it with strip.
Much of what I wrote here is in man 5 elf.
I don't think doing what you are doing (with --defsym) is supported/supposed to work with dynamic linking. Looking at the compiler output (gcc -S -masm=intel), I see this
lea rsi, foo[rip]
Or, if we look at objdump -M intel -rD a.out (linking with -q to preserve relocations), we see the same thing: rip-relative addressing is used to get the address of foo.
113d: 48 8d 35 ab ad 00 00 lea rsi,[rip+0xadab] # beef <foo>
1140: R_X86_64_PC32 foo-0x4
The compiler doesn't know that it's going to be an absolute symbol, so it produces the code it does (as for a normal symbol). rip is the instruction pointer, so it depends on the base address of the segment containing .text after the program is mapped into memory by ld.so.
I found this answer shedding light on the proper use-case for absolute symbols.
I want to remove unused functions from code while compiling. Then I write some code (main.c):
#include <stdio.h>
const char *get1();
int main()
{
puts( get1() );
}
and getall.c:
const char *get1()
{
return "s97symmqdn-1";
}
const char *get2()
{
return "s97symmqdn-2";
}
const char *get3()
{
return "s97symmqdn-3";
}
Makefile
test1 :
rm -f a.out *.o *.a
gcc -ffunction-sections -fdata-sections -c main.c getall.c
ar cr libgetall.a getall.o
gcc -Wl,--gc-sections main.o -L. -lgetall
After run make test1 && objdump --sym a.out | grep get , I only find the next 2 lines output:
0000000000000000 l df *ABS* 0000000000000000 getall.c
0000000000400535 g F .text 000000000000000b get1
I guess the get2 and get3 was removed. But when I open the a.out by vim, I found s97symmqdn-1 s97symmqdn-2 s97symmqdn-3 exists.
Is the function get2 get3 removed really ? How I can remove the symbol s97symmqdn-2 s97symmqdn-3 ? Thank you for your reply.
My system is centos7 and gcc version is 4.8.5
The compilation options -ffunction-sections -fdata-sections and linkage option --gc-sections
are working correctly in your example. Your static library is superfluous, so it can
be simplified to:
$ gcc -ffunction-sections -fdata-sections -c main.c getall.c
$ gcc -Wl,--gc-sections main.o getall.o -Wl,-Map=mapfile
in which I'm also asking for the linker's mapfile.
The unused functions get2 and get3 are absent from the executable:
$ nm a.out | grep get
0000000000000657 T get1
and the mapfile shows that the unused function-sections .text.get2 and .text.get3 in which get2 and get3 are
respectively defined were discarded in the linkage:
mapfile (1)
...
Discarded input sections
...
.text.get2 0x0000000000000000 0xd getall.o
.text.get3 0x0000000000000000 0xd getall.o
...
Nevertheless, as you found, all three of the string literals "s97symmqdn-(1|2|3)"
are in the program:
$ strings a.out | egrep 's97symmqdn-(1|2|3)'
s97symmqdn-1
s97symmqdn-2
s97symmqdn-3
That is because -fdata-sections applies just to the same data objects that
__attribute__ ((__section__("name"))) applies to1, i.e. to the definitions
of variables that have static storage duration. It is not applied to anonymous string literals like your
"s97symmqdn-(1|2|3)". They are all just placed in the .rodata section as usual,
and there we find them:
$ objdump -s -j .rodata a.out
a.out: file format elf64-x86-64
Contents of section .rodata:
06ed 73393773 796d6d71 646e2d31 00733937 s97symmqdn-1.s97
06fd 73796d6d 71646e2d 32007339 3773796d symmqdn-2.s97sym
070d 6d71646e 2d3300 mqdn-3.
--gc-sections does not allow the linker to discard .rodata from the program
because it is not an unused section: it contains "s97symmqdn-1", referenced
in the program by get1 as well as the unreferenced strings "s97symmqdn-2"
and "s97symmqdn-3"
Fix
To get these three string literals separated into distinct data sections, you
need to assign them to distinct named objects, e.g.
getcall.c (2)
const char *get1()
{
static const char s[] = "s97symmqdn-1";
return s;
}
const char *get2()
{
static const char s[] = "s97symmqdn-2";
return s;
}
const char *get3()
{
static const char s[] = "s97symmqdn-3";
return s;
}
If we recompile and relink with that change, we see:
mapfile (2)
...
Discarded input sections
...
.text.get2 0x0000000000000000 0xd getall.o
.text.get3 0x0000000000000000 0xd getall.o
.rodata.s.1797
0x0000000000000000 0xd getall.o
.rodata.s.1800
0x0000000000000000 0xd getall.o
...
Now there are two new discarded data-sections, which contain
the two string literals we don't need, as we can see in the object file:
$ objdump -s -j .rodata.s.1797 getall.o
getall.o: file format elf64-x86-64
Contents of section .rodata.s.1797:
0000 73393773 796d6d71 646e2d32 00 s97symmqdn-2.
and:
$ objdump -s -j .rodata.s.1800 getall.o
getall.o: file format elf64-x86-64
Contents of section .rodata.s.1800:
0000 73393773 796d6d71 646e2d33 00 s97symmqdn-3.
Only the referenced string "s97symmqdn-1" now appears anywhere in the program:
$ strings a.out | egrep 's97symmqdn-(1|2|3)'
s97symmqdn-1
and it is the only string in the program's .rodata:
$ objdump -s -j .rodata a.out
a.out: file format elf64-x86-64
Contents of section .rodata:
06f0 73393773 796d6d71 646e2d31 00 s97symmqdn-1.
[1] Likewise, -function-sections has the same effect as qualifying the
definition of every function foo with __attribute__ ((__section__(".text.foo")))
I have a sample program like this:
#include <stdio.h>
#if 1
#define FOR_EXPORT __attribute__ ((visibility("hidden")))
#else
#define FOR_EXPORT
#endif
FOR_EXPORT void mylocalfunction1(void)
{
printf("function1\n");
}
void mylocalfunction2(void)
{
printf("function2\n");
}
void mylocalfunction3(void)
{
printf("function3\n");
}
void printMessage(void)
{
printf("Running the function exported from the shared library\n");
}
And compile it using
gcc -shared -fPIC -fvisibility=hidden -o libdefaultvisibility.so defaultvisibility.c
Now after compilation I do:
$ nm libdefaultvisibility.so
nm libdefaultvisibility.so
0000000000000eb0 t _mylocalfunction1
0000000000000ed0 t _mylocalfunction2
0000000000000ef0 t _mylocalfunction3
0000000000000f10 t _printMessage
U _printf
U dyld_stub_binder
Which means as far as I can tell that despite -fvisibility=hidden all symbols get exported. The book I was following claimed that only the function marked with FOR_EXPORT should be exported.
I looked oup several other resources, but for the simple test I'm doing -fvisibility=hidden should be sufficient.
My clang version:
$ clang -v
clang -v
Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.0.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
You're misunderstanding the output of nm. Scroll through man nm and you'll you
read that the t flag means the symbol is a local (static) symbol in
the text section. The linker can't see it. If it were global (external)
the flag would be T. So all four of your functions are local.
Contrast:
$ clang -shared -fPIC -fvisibility=hidden -o libdefaultvisibility.so defaultvisibility.c
$ nm libdefaultvisibility.so | grep ' t '
0000000000000570 t deregister_tm_clones
0000000000000600 t __do_global_dtors_aux
0000000000200e08 t __do_global_dtors_aux_fini_array_entry
0000000000000640 t frame_dummy
0000000000200e00 t __frame_dummy_init_array_entry
0000000000000670 t mylocalfunction1
0000000000000690 t mylocalfunction2
00000000000006b0 t mylocalfunction3
00000000000006d0 t printMessage
00000000000005b0 t register_tm_clones
with dropping the -fvisibility=hidden:
$ clang -shared -fPIC -o libdefaultvisibility.so defaultvisibility.c
$ nm libdefaultvisibility.so | grep ' t '
0000000000000600 t deregister_tm_clones
0000000000000690 t __do_global_dtors_aux
0000000000200e08 t __do_global_dtors_aux_fini_array_entry
00000000000006d0 t frame_dummy
0000000000200e00 t __frame_dummy_init_array_entry
0000000000000700 t mylocalfunction1
0000000000000640 t register_tm_clones
$ nm libdefaultvisibility.so | grep ' T '
0000000000000780 T _fini
00000000000005b0 T _init
0000000000000720 T mylocalfunction2
0000000000000740 T mylocalfunction3
0000000000000760 T printMessage
Then only the explicitly hidden mylocalfunction1 remains local, and the
other three are now global.
You should not expect that a symbol marked with __attribute__ ((visibility("hidden")))
will be exported by a shared library in any circumstances. The attribute means precisely
that it will not be, whether it is applied explicitly to a symbol, as in this case,
or acquired by default in the presence of the linker option -fvisibility=hidden.
If you want to export just that one function in the example by means of a visibility attribution
you would have:
#define FOR_EXPORT __attribute__ ((visibility("default")))
Then:
$ clang -shared -fPIC -fvisibility=hidden -o libdefaultvisibility.so defaultvisibility.c
$ nm libdefaultvisibility.so | grep ' T '
0000000000000720 T _fini
0000000000000550 T _init
00000000000006a0 T mylocalfunction1
It is global, because the explicit attribition overrides the commandline option,
and all your other functions are local. Perhaps confusingly, default visibility
is always public.
And you could accomplish this without resorting to visibility attributions - which are
not portable - simply declaring all the functions that you don't want to export as static. Then the compiler
would not expose them to the linker in the first place:
foo.c
#include <stdio.h>
void mylocalfunction1(void)
{
printf("function1\n");
}
static void mylocalfunction2(void)
{
printf("function2\n");
}
static void mylocalfunction3(void)
{
printf("function3\n");
}
static void printMessage(void)
{
printf("Running the function exported from the shared library\n");
}
With which you get again:-
$ clang -shared -fPIC -o libfoo.so foo.c
$ nm libfoo.so | grep ' T '
00000000000006c0 T _fini
0000000000000550 T _init
00000000000006a0 T mylocalfunction1
Although the distinction does not make itself felt in your example you
should understand that while a local/static symbol is not seen by the linker and (therefore) is unavailable for dynamic linkage, a global/external symbol
may or may not be available for dynamic linkage. visibility
controls the availability of global symbols for dynamic linkage, only.
According to GCC Wiki on Visibility, you should:
Use nm -C -D on the outputted DSO [Dynamic Shared Object] to compare before and after to see
the difference it makes.
As stated on nm manual:
-D will display the dynamic symbols rather than the normal symbols
If I compile your code exactly as you did I get the following objects:
$ nm -C -D libdefaultvisibility.so
nm -C -D libdefaultvisibility.so
0000000000200a68 B __bss_start
w __cxa_finalize
0000000000200a68 D _edata
0000000000200a70 B _end
00000000000006c8 T _fini
w __gmon_start__
0000000000000518 T _init
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
w _Jv_RegisterClasses
U puts
And if I compile it without the -fvisibility=hidden option I get the objects:
$ nm -C -D libdefaultvisibility.so
nm -C -D libdefaultvisibility.so
0000000000200ae8 B __bss_start
w __cxa_finalize
0000000000200ae8 D _edata
0000000000200af0 B _end
0000000000000748 T _fini
w __gmon_start__
00000000000005a0 T _init
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
w _Jv_RegisterClasses
0000000000000712 T mylocalfunction2
0000000000000724 T mylocalfunction3
0000000000000736 T printMessage
U puts
I am reading the book Hacking, the art of exploitation. In the book there is a section that explain the use of .dtors and .ctors.
I'm trying to reproduce one of the exercises of the book but in my executable I do not have this sections. At first I thought the problem was that I was compiling for 64-bit, but now I'm compiling for 32-bit and .dtors and .ctors are still not appearing in the section table. Here is the code:
#include <stdio.h>
#include <stdlib.h>
static void
miConstructor(void) __attribute__ ((constructor));
static void
miDestructor(void) __attribute__ ((destructor));
int
main(void) {
printf("En main() \n");
return 0;
}
void
miConstructor(void) {
printf("En el constructor\n");
}
void
miDestructor(void) {
printf("En el destructor\n");
}
I am compiling with:
gcc -m32 -o a.out dtors_example.c
This is the output of nm:
080495f0 d _DYNAMIC
080496e4 d _GLOBAL_OFFSET_TABLE_
080484dc R _IO_stdin_used
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
w _Jv_RegisterClasses
080485d8 r __FRAME_END__
080495ec d __JCR_END__
080495ec d __JCR_LIST__
08049704 D __TMC_END__
08049704 A __bss_start
080496fc D __data_start
080483c0 t __do_global_dtors_aux
080495e4 t __do_global_dtors_aux_fini_array_entry
08049700 D __dso_handle
080495dc t __frame_dummy_init_array_entry
w __gmon_start__
080484ba T __i686.get_pc_thunk.bx
080495e4 t __init_array_end
080495dc t __init_array_start
08048450 T __libc_csu_fini
08048460 T __libc_csu_init
U __libc_start_main##GLIBC_2.0
08049704 A _edata
08049708 A _end
080484c0 T _fini
080484d8 R _fp_hw
080482b8 T _init
08048320 T _start
08049704 b completed.5730
080496fc W data_start
08048350 t deregister_tm_clones
080483e0 t frame_dummy
0804840c T main
08048428 t miConstructor
0804843c t miDestructor
U puts##GLIBC_2.0
08048380 t register_tm_clones
The output of objdump neither show .dtors or .ctors
Maybe the sections __init_array_end, __init_array_start or __do_global_dtors_aux are related with the behavior of .ctors and .dtors?
The issue is likely gcc. under gcc 4.7 version can generate .ctors sections, but gcc 4.7 use .init_array instead of .ctors. you can confirm this by doing command which list below.
objdump -dr -j .ctors a.out.if no sections found, try objdump -dr -j .init_array a.out
or you can do this readelf -S a.out to list all sections. then you'll find .ctors or(and) .init_array.
Use objdump command with -x option to see the full available header info, symbol table and relocation entries.
objdump -x ./yourcommand