I have written a snippet that has memory problems when dynamically allocating; when compiled with -lefence option, it seems that there is no effect. Here is the code segment:
int main(int argc, char *argv[])
{
int *a = (int *)malloc(2*sizeof(int));
for(int i = 0; i <=2; ++i){
a[i] = i;
printf ("%d\n",a[i]);
}
free(a);
return 0;
}
And the compilation options:
gcc -g3 -Wall -std=c99 outOfBound.c -lefence
The expected result is that when a.out is executed there would be a core dump after i is assigned to 2 and a[i]=i is invoked.
So Why -lefence has no effect?
I have also increased the upper bound in the loop to 9, but there is still no core dump thatelectric-fence invoked. (Actually there is indeed a core dump by default, but this might due to the MALLOC_CHECK_ env virable since when I export MALLOC_CHECK_=0, there would be no more core dump).
UPDATE: the whole result of nm -A a.out is as below:
a.out:08049f28 d _DYNAMIC
a.out:08049ff4 d _GLOBAL_OFFSET_TABLE_
a.out:0804864c R _IO_stdin_used
a.out: w _Jv_RegisterClasses
a.out:08049f18 d __CTOR_END__
a.out:08049f14 d __CTOR_LIST__
a.out:08049f20 d __DTOR_END__
a.out:08049f1c d __DTOR_LIST__
a.out:08048718 r __FRAME_END__
a.out:08049f24 d __JCR_END__
a.out:08049f24 d __JCR_LIST__
a.out:0804a01c A __bss_start
a.out:0804a014 D __data_start
a.out:08048600 t __do_global_ctors_aux
a.out:08048480 t __do_global_dtors_aux
a.out:0804a018 d __dso_handle
a.out: w __gmon_start__
a.out:080485f2 t __i686.get_pc_thunk.bx
a.out:00000000 a __init_array_end
a.out:00000000 a __init_array_start
a.out:080485f0 T __libc_csu_fini
a.out:08048580 T __libc_csu_init
a.out: U __libc_start_main
a.out:0804a01c A _edata
a.out:0804a024 A _end
a.out:0804862c T _fini
a.out:08048648 R _fp_hw
a.out:080483b4 T _init
a.out:08048450 T _start
a.out:0804a01c b completed.6159
a.out:0804a014 W data_start
a.out:0804a020 b dtor_idx.6161
a.out:080484e0 t frame_dummy
a.out: U free
a.out:08048504 T main
a.out: U malloc
a.out: U printf
(I am using a debian package electric-fence on Ubuntu 12.04 32bit, gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3)
Update(20140801):
For electric-fence of version 2.2.4 packaged by debian(testing branch, i.e. jessie), it works.
It is possible you are running into this.
... it must increase the size of the allocation to a multiple of the
word size. In addition, the functions memalign() and valloc() must
honor explicit specifications on the alignment of the memory
allocation, and this, as well can only be implemented by increasing
the size of the allocation. Thus, there will be situations in which
the end of a memory allocation contains some padding space, and
accesses of that padding space will not be detected, even if they are
overruns.
Try exceed the bounds a bit more, and see at what point the overrun detection kicks in.
Once you compile and execute above program without linking it with electric fence library, it may run without any segmentation fault.
So better to link it with electric fence library and then Run it by loading it in gdb giving the following command
$ gdb a.out
....
(gdb)run
Starting program: /home/arif/sysprog-2017/processmgmt/nonlocalgoto/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Electric Fence 2.2 Copyright (C) 1987-1999 Bruce Perens <bruce#perens.com>
0
1
Program received signal SIGSEGV, Segmentation fault.
0x000055555555484d in main (argc=1, argv=0x7fffffffe228) at temp.c:8
8 a[i] = i;
So from above output of gdb you can make out the source line number which is causing the problem if you print the value of i at this instant it will be 2 :)
Related
#include <stdio.h>
void func() {}
int main() {
printf("%p", &func);
return 0;
}
This program outputted 0x55c4cda9464a
Supposing that func will be stored in the .text section, and according to this figure, from CS:APP:
I suppose that the address of func would be somewhere near the starting address of the .text section, but this address is somewhere in the middle. Why is this the case? Local variables stored on the stack have addresses near 2^48 - 1, but I tried to disassemble different C codes and the instructions were always located somewhere around that 0x55... address.
gcc, when configured with --enable-default-pie1 (which is the default), produces Position Independent Executables(PIE). Which means the load address isn't same as what linker specified at compile-time (0x400000 for x86_64). This is a security mechanism so that Address Space Layout Randomization (ASLR) 2 can be enabled. That is, gcc compiles with -pie option by default.
If you compile with -no-pie option (gcc -no-pie file.c), then you can see the address of func is closer to 0x400000.
On my system, I get:
$ gcc -no-pie t.c
$ ./a.out
0x401132
You can also check the load address with readelf:
$ readelf -Wl a.out | grep LOAD
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x000478 0x000478 R 0x1000
LOAD 0x001000 0x0000000000401000 0x0000000000401000 0x0001f5 0x0001f5 R E 0x1000
LOAD 0x002000 0x0000000000402000 0x0000000000402000 0x000158 0x000158 R 0x1000
LOAD 0x002e10 0x0000000000403e10 0x0000000000403e10 0x000228 0x000230 RW 0x1000
1 you can check this with gcc --verbose.
2 You may also notice that address printed by your program is different in each run. That's because of ASLR. You can disable it with:
$ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
ASLR is enabled on Linux by default.
I have an object file of a C program which prints hello world, just for the question.
I am trying to understand using readelf utility or gdb or hexedit(I can't figure which tool is a correct one) where in the file does the code of function "main" starts.
I know using readelf that symbol _start & main occurs and the address where it is mapped in a virtual memory. Moreover, I also know what the size of .text section and the of coruse where entry point specified, i.e the address which the same of text section.
The question is - Where in the file does the code of function "main" starts? I tought that is the entry point and the offset of the text section but how I understand it the sections data, bss, rodata should be ran before main and it appears after section text in readelf.
Also I tought we should sum the size all the lines till main in symbol table, but I am not sure at all if it is correct.
Additional question which follow up this one is if I want to replace main function with NOP instrcutres or plant one ret instruction in my object file. how can I know the offset where I can do it using hexedit.
So, let's go through it step by step.
Start with this C file:
#include <stdio.h>
void printit()
{
puts("Hello world!");
}
int main(void)
{
printit();
return 0;
}
As the comments look like you are on x86, compile it as 32-bit non-PIE executable like this:
$ gcc -m32 -no-pie -o test test.c
The -m32 option is needed, because I am working at a x86-64 machine. As you already know, you can get the virtual memory address of main using readelf, objdump or nm, for example like this:
$ nm test | grep -w main
0804918d T main
Obviously, 804918d can not be an offset in the file that is just 15 kB big. You need to find the mapping between virtual memory addresses and file offsets. In a typical ELF file, the mapping is included twice. Once in a detailed form for linkers (as object files are also ELF files) and debuggers, and a second time in a condensed form that is used by the kernel for loading programs. The detailed form is the list of sections, consisting of section headers, and you can view it like this (the output is shortened a bit, to make the answer more readable):
$ readelf --section-headers test
There are 29 section headers, starting at offset 0x3748:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[...]
[11] .init PROGBITS 08049000 001000 000020 00 AX 0 0 4
[12] .plt PROGBITS 08049020 001020 000030 04 AX 0 0 16
[13] .text PROGBITS 08049050 001050 0001c1 00 AX 0 0 16
[14] .fini PROGBITS 08049214 001214 000014 00 AX 0 0 4
[15] .rodata PROGBITS 0804a000 002000 000015 00 A 0 0 4
[...]
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
p (processor specific)
Here you find that the .text section starts at (virtual) address 08049050 and has a size of 1c1 bytes, so it ends at address 08049211. The address of main, 804918d is in this range, so you know main is a member of the text section. If you subtract the base of the text section from the address of main, you find that main is 13d bytes into the text section. The section listing also contains the file offset where the data for the text section starts. It's 1050, so the first byte of main is at offset 0x1050 + 0x13d == 0x118d.
You can do the same calculation using program headers:
$ readelf --program-headers test
[...]
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x00160 0x00160 R 0x4
INTERP 0x000194 0x08048194 0x08048194 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0x002e8 0x002e8 R 0x1000
LOAD 0x001000 0x08049000 0x08049000 0x00228 0x00228 R E 0x1000
LOAD 0x002000 0x0804a000 0x0804a000 0x0019c 0x0019c R 0x1000
LOAD 0x002f0c 0x0804bf0c 0x0804bf0c 0x00110 0x00114 RW 0x1000
[...]
The second load line tells you that the area 08049000 (VirtAddr) to 08049228 (VirtAddr + MemSiz) is readable and executable, and loaded from offset 1000 in the file. So again you can calculate that the address of main is 18d bytes into this load area, so it has to reside at offset 0x118d inside the executable. Let's test that:
$ ./test
Hello world!
$ echo -ne '\xc3' | dd of=test conv=notrunc bs=1 count=1 seek=$((0x118d))
1+0 records in
1+0 records out
1 byte copied, 0.0116672 s, 0.1 kB/s
$ ./test
$
Overwriting the first byte of main with 0xc3, the opcode for return (near) on x86, causes the program to not output anything anymore.
_start normally belongs to a module ( a *.o file) that is fixed (it is called differently on different systems, but a common name is crt0.o which is written in assembler.) That fixed code prepares the stack (normally the arguments and the environment are stored in the initial stack segment by the execve(2) system call) the mission of crt0.s is to prepare the initial C stack frame and call main(). Once main() ends, it is responsible of getting the return value from main and calling all the atexit() handlers to finish calling the _exit(2) system call.
The linking of crt0.o is normally transparent due to the fact that you always call the compiler to do the linking itself, so you normally don't have to add crt0.o as the first object module, but the compiler knows (lately, all this stuff has grown considerably, since we depend on architecture and ABIs to pass parameters between functions)
If you execute the compiler with the -v option, you'll get the exact command line it uses to call the linker and you'll get the secrets of the final memory map your program has on its first stages.
I have written a simple program fabs.c to display the absolute value of a floating point number.
#include <stdio.h>
#include <math.h>
int main(void)
{
float f;
printf("Enter a floating-point number: ");
scanf("%f", &f);
printf("Its absolute value is %f.\n", fabs(f));
return 0;
}
fabs() function requires including the math.h header file, but I compiled successfully without -lm option.
gcc fabs.c -o fabs
Even man fabs says link with -lm. But I don't know why I can compile it successfully without -lm.
If the manual says that you should link with -lm, then you should link with -lm. In this case your code is simple enough and the compiler is smart enough to inline it (because your system always uses the gcc built-in). Maybe it won't be able to in some cases. Some of the floating point function built-ins fall back to the library functions if they can't be trivially inlined (not fabs, but many others).
Manuals often tell you to do things that aren't strictly necessary in all cases because it's easier to say "do X" than to say "if you do A, B, but not C, you can maybe not have to do X, but please read the manual in the next version because we will add D and B will probably change, we'll never change A (unless we change our mind)".
By linking with -lm you ensure that your program will work on most reasonable systems for a reasonably foreseeable future. Even though it's not strictly necessary on one particular machine at this particular point in time, with this particular code, compiled with the particular options you had this time.
Because gcc will optimize some of your code. Like printf, gcc can replace fabs calls. To be sure, you can compile your source code with -fno-builtin to forbid gcc from doing so:
yoones#laptop:/tmp/toto$ gcc -fno-builtin main.c
/tmp/cc5fWozq.o: In function `main':
main.c:(.text+0x37): undefined reference to `fabs'
collect2: error: ld returned 1 exit status
You can also use nm to list your executable symbols:
yoones#laptop:/tmp/toto$ nm ./a.out
0000000000600a18 B __bss_start
0000000000600a18 b completed.6661
0000000000600a08 D __data_start
0000000000600a08 W data_start
00000000004004b0 t deregister_tm_clones
0000000000400530 t __do_global_dtors_aux
00000000006007e8 t __do_global_dtors_aux_fini_array_entry
0000000000600a10 D __dso_handle
00000000006007f8 d _DYNAMIC
0000000000600a18 D _edata
0000000000600a20 B _end
0000000000400644 T _fini
0000000000400550 t frame_dummy
00000000006007e0 t __frame_dummy_init_array_entry
00000000004007d8 r __FRAME_END__
00000000006009d0 d _GLOBAL_OFFSET_TABLE_
w __gmon_start__
0000000000400408 T _init
00000000006007e8 t __init_array_end
00000000006007e0 t __init_array_start
0000000000400650 R _IO_stdin_used
U __isoc99_scanf##GLIBC_2.7
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
00000000006007f0 d __JCR_END__
00000000006007f0 d __JCR_LIST__
w _Jv_RegisterClasses
0000000000400640 T __libc_csu_fini
00000000004005d0 T __libc_csu_init
U __libc_start_main##GLIBC_2.2.5
0000000000400576 T main
U printf##GLIBC_2.2.5
00000000004004f0 t register_tm_clones
0000000000400480 T _start
0000000000600a18 D __TMC_END__
I found the following post (How to generate gcc debug symbol outside the build target?) on how to split a the compiled file and the debugging symbols.
However, I cannot find any useful information in the debugging file.
For example,
My helloWorld code is:
#include<stdio.h>
int main(void) {
int a;
a = 5;
printf("The memory address of a is: %p\n", (void*) &a);
return 0;
}
I ran gcc -g -o hello hello.c
objcopy --only-keep-debug hello hello.debug
gdb -s main.debug -e main
In gdb, anything I tried won't give me any information on a, I cannot find its address, I cannot find the main function address
For example :
(gdb) info variables
All defined variables:
Non-debugging symbols:
0x0000000000400618 _IO_stdin_used
0x0000000000400710 __FRAME_END__
0x0000000000600e3c __init_array_end
0x0000000000600e3c __init_array_start
0x0000000000600e40 __CTOR_LIST__
0x0000000000600e48 __CTOR_END__
0x0000000000600e50 __DTOR_LIST__
0x0000000000600e58 __DTOR_END__
0x0000000000600e60 __JCR_END__
0x0000000000600e60 __JCR_LIST__
0x0000000000600e68 _DYNAMIC
0x0000000000601000 _GLOBAL_OFFSET_TABLE_
0x0000000000601028 __data_start
0x0000000000601028 data_start
0x0000000000601030 __dso_handle
0x0000000000601038 __bss_start
0x0000000000601038 _edata
0x0000000000601038 completed.6603
0x0000000000601040 dtor_idx.6605
0x0000000000601048 _end
Am I doing something wrong? Am I understanding the debug file incorrectly? Is there even a way to find out an address of compiled variable/function from a saved debugging information?
int a is a stack variable so it does not have a fixed address unless you are in a call to that specific function. Furthermore, each call to that function will allocate its own variable.
When we say "debugging symbols" we usually mean functions and global variables. A local variable is not a "symbol" in this context. In fact, if you compile with optimisations enabled int a would almost certainly be optimised to a register variable so it would not have an address at all, unless you forced it to be written to memory by doing some_function(&a) or similar.
You can find the address of main just by writing print main in GDB. This is because functions are implicitly converted to pointers in C when they appear in value context, and GDB's print uses C semantics.
I register a token destructor function with
static void cleanup __attribute__ ((destructor));
The function just prints a debug message; the token program runs fine (main() just prints another message; token function prints upon exit).
When I look at the file with
nm ./a.out,
I see:
08049f10 d __DTOR_END__
08049f0c d __DTOR_LIST__
However, the token destructor function's address should be at 0x08049f10 - an address which contains 0, indicating end of destructor list, as I can check using:
objdump -s ./a.out
At 0x08049f0c, I see 0xffffffff, as is expected for this location. It is my understanding that what I see in the elf file would mean that no destructor was registered; but it is executed with one.
If someone could explain, I'd appreciate. Is this part of the security suite to prevent inserting malicious destructors? How does the compiler keep track of the destructors' addresses?
My system:
Ubuntu 12.04.
elf32-i386
Kernel: 3.2.0-30-generic-pae
gcc version: 4.6.3
DTOR_LIST is the start of a table of desctructors. Have a look which section it is in (probably .dtors):
~> objdump -t test | grep DTOR_LIST
0000000000600728 l O .dtors 0000000000000000 __DTOR_LIST__
Then dump that section with readelf (or whatever):
~> readelf --hex-dump=.dtors test
Hex dump of section '.dtors':
0x00600728 ffffffff ffffffff 1c054000 00000000 ..........#.....
0x00600738 00000000 00000000 ........
Which in my test case contains a couple of presumably -1, a real pointer, and then zero termination.