Location of destructors in elf files: not where it should be?

Location of destructors in elf files: not where it should be? - c

I register a token destructor function with
static void cleanup __attribute__ ((destructor));
The function just prints a debug message; the token program runs fine (main() just prints another message; token function prints upon exit).
When I look at the file with
nm ./a.out,
I see: 
08049f10 d __DTOR_END__
08049f0c d __DTOR_LIST__
However, the token destructor function's address should be at 0x08049f10 - an address which contains 0, indicating end of destructor list, as I can check using:
objdump -s ./a.out
At 0x08049f0c, I see 0xffffffff, as is expected for this location. It is my understanding that what I see in the elf file would mean that no destructor was registered; but it is executed with one.
If someone could explain, I'd appreciate. Is this part of the security suite to prevent inserting malicious destructors? How does the compiler keep track of the destructors' addresses?
My system:
Ubuntu 12.04.
elf32-i386
Kernel: 3.2.0-30-generic-pae
gcc version: 4.6.3

DTOR_LIST is the start of a table of desctructors. Have a look which section it is in (probably .dtors):
~> objdump -t test | grep DTOR_LIST
0000000000600728 l O .dtors 0000000000000000 __DTOR_LIST__
Then dump that section with readelf (or whatever):
~> readelf --hex-dump=.dtors test
Hex dump of section '.dtors':
0x00600728 ffffffff ffffffff 1c054000 00000000 ..........#.....
0x00600738 00000000 00000000 ........
Which in my test case contains a couple of presumably -1, a real pointer, and then zero termination.

Related

cant view C structures from an assembly program? [duplicate]

I recently started learning assembly language for the Intel x86-64 architecture using YASM. While solving one of the tasks suggested in a book (by Ray Seyfarth) I came to following problem:
When I place some characters into a buffer in the .bss section, I still see an empty string while debugging it in gdb. Placing characters into a buffer in the .data section shows up as expected in gdb.
segment .bss
result resb 75
buf resw 100
usage resq 1
segment .data
str_test db 0, 0, 0, 0
segment .text
global main
main:
mov rbx, 'A'
mov [buf], rbx ; LINE - 1 STILL GET EMPTY STRING AFTER THAT INSTRUCTION
mov [str_test], rbx ; LINE - 2 PLACES CHARACTER NICELY.
ret
In gdb I get:
after LINE 1: x/s &buf, result - 0x7ffff7dd2740 <buf>: ""
after LINE 2: x/s &str_test, result - 0x601030: "A"
It looks like &buf isn't evaluating to the correct address, so it still sees all-zeros. 0x7ffff7dd2740 isn't in the BSS of the process being debugged, according to its /proc/PID/maps, so that makes no sense. Why does &buf evaluate to the wrong address, but &str_test evaluates to the right address? Neither are "global" symbols, but we did build with debug info.
Tested with GNU gdb (Ubuntu 7.10-1ubuntu2) 7.10 on x86-64 Ubuntu 15.10.
I'm building with
yasm -felf64 -Worphan-labels -gdwarf2 buf-test.asm
gcc -g buf-test.o -o buf-test
nm on the executable shows the correct symbol addresses:
$ nm -n buf-test # numeric sort, heavily edited to omit symbols from glibc
...
0000000000601028 D __data_start
0000000000601038 d str_test
...
000000000060103c B __bss_start
0000000000601040 b result
000000000060108b b buf
0000000000601153 b usage
(editor's note: I rewrote a lot of the question because the weirdness is in gdb's behaviour, not the OP's asm!).

glibc includes a symbol named buf, as well.
(gdb) info variables ^buf$
All variables matching regular expression "^buf$":
File strerror.c:
static char *buf;
Non-debugging symbols:
0x000000000060108b buf <-- this is our buf
0x00007ffff7dd6400 buf <-- this is glibc's buf
gdb happens to choose the symbol from glibc over the symbol from the executable. This is why ptype buf shows char *.
Using a different name for the buffer avoids the problem, and so does a global buf to make it a global symbol. You also wouldn't have a problem if you wrote a stand-alone program that didn't link libc (i.e. define _start and make an exit system call instead of running a ret)
Note that 0x00007ffff7dd6400 (address of buf on my system; different from yours) is not actually a stack address. It visually looks like a stack address, but it's not: it has a different number of f digits after the 7. Sorry for that confusion in comments and an earlier edit of the question.
Shared libraries are also loaded near the top of the low 47 bits of virtual address space, near where the stack is mapped. They're position-independent, but a library's BSS space has to be in the right place relative to its code. Checking /proc/PID/maps again more carefully, gdb's &buf is in fact in the rwx block of anonymous memory (not mapped to any file) right next to the mapping for libc-2.21.so.
7ffff7a0f000-7ffff7bcf000 r-xp 00000000 09:7f 17031175 /lib/x86_64-linux-gnu/libc-2.21.so
7ffff7bcf000-7ffff7dcf000 ---p 001c0000 09:7f 17031175 /lib/x86_64-linux-gnu/libc-2.21.so
7ffff7dcf000-7ffff7dd3000 r-xp 001c0000 09:7f 17031175 /lib/x86_64-linux-gnu/libc-2.21.so
7ffff7dd3000-7ffff7dd5000 rwxp 001c4000 09:7f 17031175 /lib/x86_64-linux-gnu/libc-2.21.so
7ffff7dd5000-7ffff7dd9000 rwxp 00000000 00:00 0 <--- &buf is in this mapping
...
7ffffffdd000-7ffffffff000 rwxp 00000000 00:00 0 [stack] <---- more FFs before the first non-FF than in &buf.
A normal call instruction with a rel32 encoding can't reach a library function, but it doesn't need to because GNU/Linux shared libraries have to support symbol interposition, so calls to library functions actually jump to the PLT, where an indirect jmp (with a pointer from the GOT) goes to the final destination.

Why are instructions addresses on the top of the memory user space contrary to the linux process memory layout?

#include <stdio.h>
void func() {}
int main() {
printf("%p", &func);
return 0;
}
This program outputted 0x55c4cda9464a
Supposing that func will be stored in the .text section, and according to this figure, from CS:APP:
I suppose that the address of func would be somewhere near the starting address of the .text section, but this address is somewhere in the middle. Why is this the case? Local variables stored on the stack have addresses near 2^48 - 1, but I tried to disassemble different C codes and the instructions were always located somewhere around that 0x55... address.

gcc, when configured with --enable-default-pie1 (which is the default), produces Position Independent Executables(PIE). Which means the load address isn't same as what linker specified at compile-time (0x400000 for x86_64). This is a security mechanism so that Address Space Layout Randomization (ASLR) 2 can be enabled. That is, gcc compiles with -pie option by default.
If you compile with -no-pie option (gcc -no-pie file.c), then you can see the address of func is closer to 0x400000.
On my system, I get:
$ gcc -no-pie t.c
$ ./a.out
0x401132
You can also check the load address with readelf:
$ readelf -Wl a.out | grep LOAD
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x000478 0x000478 R 0x1000
LOAD 0x001000 0x0000000000401000 0x0000000000401000 0x0001f5 0x0001f5 R E 0x1000
LOAD 0x002000 0x0000000000402000 0x0000000000402000 0x000158 0x000158 R 0x1000
LOAD 0x002e10 0x0000000000403e10 0x0000000000403e10 0x000228 0x000230 RW 0x1000
1 you can check this with gcc --verbose.
2 You may also notice that address printed by your program is different in each run. That's because of ASLR. You can disable it with:
$ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
ASLR is enabled on Linux by default.

Is it possible to make a hardcoding with the help of the command objcopy

I'm working on Linux and I've just heard that there was a command objcopy, I've found the relative command on my x86_64 PC: x86_64-linux-gnu-objcopy.
With its help, I can convert a file into an obj file: x86_64-linux-gnu-objcopy -I binary -O elf64-x86-64 custom.config custom.config.o
The file custom.config is a human-readable file. It contains two lines:
name titi
password 123
Now I can execute objdump -x -s custom.config.o to check its information.
custom.config.o: file format elf64-little
custom.config.o
architecture: UNKNOWN!, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 00000017 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g .data 0000000000000000 _binary_custom_config_start
0000000000000017 g .data 0000000000000000 _binary_custom_config_end
0000000000000017 g *ABS* 0000000000000000 _binary_custom_config_size
Contents of section .data:
0000 6e616d65 20746974 690a7061 7373776f name titi.passwo
0010 72642031 32330a rd 123.
As all we know, we can open, read or write a file, such as custom.config in any C/C++ project. Now, I'm thinking if it's possible to use this obj file custom.config.o immediately in a C/C++ project. For example, is it possible to read the content of the file custom.config.o immediately without calling the I/O functions, such as open, read or write. If possible, I think this might become some kind of hardcoding style and avoid calling the I/O functions?

Even if I tried this on Win10 with MinGW (MinGW-W64 project, GCC 8.1.0), this should work for you with only minor adaptions.
As you see from the info objdump gave you, the file's contents is placed in the .data section that is the common section for non-constant variables.
And some symbols were defined for it. You can declare these symbols in your C source.
The absolute value _binary_custom_config_size is special, because it is marked *ABS*. Currently I know no other way to obtain its value than to declare a variable of any type and take its address.
This is my show_config.c:
#include <stdio.h>
#include <string.h>
extern const char _binary_custom_config_start[];
extern const char _binary_custom_config_size;
int main(void) {
size_t size = (size_t)&_binary_custom_config_size;
char config[size + 1];
strncpy(config, _binary_custom_config_start, size);
config[size] = '\0';
printf("config = \"%s\"\n", config);
return 0;
}
Because the "binary" file (actually a text) has no final '\0' character, you need to append one to get a correctly terminated C string.
You could as well declare _binary_custom_config_end and use it to calculate the size, or as a limit.
Building everything goes like this (I used the -g option to debug):
$ objcopy -I binary -O elf64-x86-64 -B i386 custom.config custom.config.o
$ gcc -Wall -Wextra -pedantic -g show_config.c custom.config.o -o show_config
And the output shows the success:
$ show_config.exe
config = "name titi
password 123"
If you need the file's contents in another section, you will add the option to rename the section to objcopy's call. Add any flag you need, the example shows .rodata that is used for read-only data:
--rename-section .data=.rodata,alloc,load,readonly,data,contents

How to load library defined symbols to a specified location?

The test is on Ubuntu 12.04, 32-bit, with gcc 4.6.3.
Basically I am doing some binary manipulation work on ELF binaries, and what I have to do now is to assemble a assembly program and guarantee the libc symbols are loaded to a predefined address by me.
Let me elaborate it in an simple example.
Suppose in the original code, libc symbols stdout#GLIBC_2.0 is used.
#include <stdio.h>
int main() {
FILE* fout = stdout;
fprintf( fout, "hello\n" );
}
When I compile it and check the symbol address using these commands:
gcc main.c
readelf -s a.out | grep stdout
I got this:
0804a020 4 OBJECT GLOBAL DEFAULT 25 stdout#GLIBC_2.0 (2)
0804a020 4 OBJECT GLOBAL DEFAULT 25 stdout##GLIBC_2.0
and the .bss section is like this:
readelf -S a.out | grep bss
[25] .bss NOBITS 0804a020 001014 00000c 00 WA 0 0 32
Now what I am trying to do is to load the stdout symbol in a predefined address, so I did this:
echo "stdout = 0x804a024;" > symbolfile
gcc -Wl,--just-symbols=symbolfile main.c
Then when I check the .bss section and symbol stdout, I got this:
[25] .bss NOBITS 0804a014 001014 000008 00 WA 0 0 4
4: 0804a024 0 NOTYPE GLOBAL DEFAULT ABS stdout
49: 0804a024 0 NOTYPE GLOBAL DEFAULT ABS stdout
It seems that I didn't successfully load the symbol stdout##GLIBC_2.0, but just a wired stdout. (I tried to write stdout##GLIBC_2.0 in symbolfile, but it can't compile... )
It seems that as I didn't make it, the beginning address of .bss section has also changed, which makes the address of stdout symbol in a non-section area. During runtime, it throws a segmentation fault when loading from 0x804a024.
Could anyone help me on how to successfully load the library symbol at a predefined address? Thanks!

How do I see the memory locations of static variables within .bss?

Supposing I have a static variable declared in gps_anetenova_m10478.c as follows:
static app_timer_id_t m_gps_response_timeout_timer_id;
I have some sort of buffer overrun bug in my code and at some point a write to the variable right before m_gps_response_timeout_timer_id in memory is overwriting it.
I can find out where m_gps_response_timeout_timer_id is in memory using the 'Expressions' view in Eclipse's GDB client. Just enter &m_gps_response_timeout_timer_id. But how do I tell which variable is immediately before it in memory?
Is there a way to get this info into the .map file that ld produces? At the moment I only see source files:
.bss 0x000000002000011c 0x0 _build/debug_leds.o
.bss 0x000000002000011c 0x11f8 _build/gps_antenova_m10478.o
.bss 0x0000000020001314 0x161c _build/gsm_ublox_sara.o

I'll be honest, I don't know enough about Eclipse to give an easy way within Eclipse to get this. The tool you're probably looking for is either objdump or nm. An example with objdump is to simply run objdump -x <myELF>. This will then return all symbols in the file, which section they're in, and their addresses. You'll then have to manually search for the variable in which you're interested based on the addresses.
objdump -x <ELFfile> will give output along the lines of the following:
000120d8 g F .text 0000033c bit_string_copy
00015ea4 g O .bss 00000004 overflow_bit
00015e24 g .bss 00000000 __bss_start
00011ce4 g F .text 0000003c main
00014b6c g F .text 0000008c integer_and
The first column is the address, the fourth the section and the fifth the length of that field.
nm <ELFfile> gives the following:
00015ea8 B __bss_end
00015e24 B __bss_start
0000c000 T _start
00015e20 D zero_constant
00015e24 b zero_constant_itself
The first column is the address and the second the section. D/d is data, B/b is BSS and T/t is text. The rest can be found in the manpage. nm also accepts the -n flag to sort the lines by their numeric address.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight