I was recently experimenting with shared library injection in Linux and decided to write my own program to do it (instead of using say GDB to inject the library).
My program uses pthread to overwrite the first 0x25 bytes of a loaded program program (0x40000-0x400025) with assembly code to allocate space for the filename and call dlopen. Once all of this is done, it restores the program state and detaches from it.
Here's the assembly:
global inject_library
global nullsub
section .data
section .text
inject_library:
; rdi -> Pointer to malloc()
; rsi -> Pointer to free()
; rdx -> Pointer to dlopen()
; rcx -> Size of the path to the .so to load
; Create a new stack frame
push rbp
; Save rbx because we're using it as scratch space
push rbx
; Save addresses of free & dlopen on the stack
push rsi
push rdx
; Move the pointer to malloc into rbx
mov rbx, rdi
; Move the size of the path as the first argument to malloc
mov rdi, rcx
; Call malloc(so_path_size)
call rbx
; Stop so that we can see what's happening from the injector process
int 0x3
; Move the pointer to dlopen into rbx
pop rbx
; Move the malloc'd space (now containing the path) to rdi for the first argument
mov rdi, rax
; Push rax because it'll be overwritten
push rax
; Second argument to dlopen (RTLD_NOW)
mov rsi, 0x2
; Call dlopen(path_to_library, RTLD_NOW)
call rbx
; Pass control to the injector
int 0x3
; Finally, begin free-ing the malloc'd area
pop rdi
; Get the address of free into rbx
pop rbx
; Call free(path_to_library)
call rbx
; Restore rbx
pop rbx
; Destory the stack frame
pop rbp
; We're done
int 0x3
retn
nullsub:
retn
There's also a C program which calls this assembly routine and uses pthread to handle these breakpoints.
This setup works just fine for small, single threaded programs like the following.
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
int main(int argc, char* argv) {
pid_t my_pid = getpid();
printf("PID: %ld\n", my_pid);
getchar();
return 0;
}
I used a simple shared library that just did puts("Hi"); in its constructor. As stated above, everything upto here works perfectly.
However, when I try to inject the same library into a much bigger (external, closed-source program), I run into a segfault.
Here's the backtrace:
#0 0x00007f6a7985d64d in _dl_relocate_object (scope=0x21fbc08, reloc_mode=reloc_mode#entry=0, consider_profiling=consider_profiling#entry=0)
at dl-reloc.c:259
#1 0x00007f6a79865723 in dl_open_worker (a=a#entry=0x7fff82d7cbf8) at dl-open.c:424
#2 0x00007f6a793cf5d4 in __GI__dl_catch_error (objname=objname#entry=0x7fff82d7cbe8, errstring=errstring#entry=0x7fff82d7cbf0,
mallocedp=mallocedp#entry=0x7fff82d7cbe7, operate=operate#entry=0x7f6a798654c0 <dl_open_worker>, args=args#entry=0x7fff82d7cbf8)
at dl-error-skeleton.c:198
#3 0x00007f6a79865069 in _dl_open (file=0x21fb830 "/home/umang/code/insertion/test_library.so", mode=-2147483646, caller_dlopen=0x40001a, nsid=-2,
argc=<optimized out>, argv=<optimized out>, env=0x7fff82d7cfe8) at dl-open.c:649
#4 0x00007f6a7964ef96 in dlopen_doit (a=a#entry=0x7fff82d7ce08) at dlopen.c:66
#5 0x00007f6a793cf5d4 in __GI__dl_catch_error (objname=objname#entry=0x7f6a798510f0 <last_result+16>,
errstring=errstring#entry=0x7f6a798510f8 <last_result+24>, mallocedp=mallocedp#entry=0x7f6a798510e8 <last_result+8>,
operate=operate#entry=0x7f6a7964ef40 <dlopen_doit>, args=args#entry=0x7fff82d7ce08) at dl-error-skeleton.c:198
#6 0x00007f6a7964f665 in _dlerror_run (operate=operate#entry=0x7f6a7964ef40 <dlopen_doit>, args=args#entry=0x7fff82d7ce08) at dlerror.c:163
#7 0x00007f6a7964f021 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#8 0x000000000040001a in ?? ()
#9 0x00000000021fb830 in ?? ()
#10 0x00007f6a79326a90 in ?? () at malloc.c:3071 from /lib64/libc.so.6
#11 0x00007f6a796488a0 in ?? () from /lib64/libc.so.6
#12 0x0000000000000d68 in ?? ()
#13 0x00007f6a7931e938 in _IO_new_file_underflow (fp=0x7f6a7964efe0 <__dlopen>) at fileops.c:600
#14 0x00007f6a7931fa72 in __GI__IO_default_uflow (fp=0x7f6a796488a0 <_IO_2_1_stdin_>) at genops.c:404
#15 0x00007f6a7931a20d in getchar () at getchar.c:37
#16 0x00000000004005d7 in main ()
This backtrace tells me something went (horribly) wrong in the dlopen call. Specifically, the error lies at glibc dl-reloc.c:259.
Here's the questionable glibc code.
254 l->l_lookup_cache.value = _lr; })) \
255 : l)
256
257 #include "dynamic-link.h"
258
259 ELF_DYNAMIC_RELOCATE (l, lazy, consider_profiling, skip_ifunc);
260
261 #ifndef PROF
262 if (__glibc_unlikely (consider_profiling)
263 && l->l_info[DT_PLTRELSZ] != NULL)
ELF_DYNAMIC_RELOCATE is a macro defined in dynamic-link.h as the following -
/* This can't just be an inline function because GCC is too dumb
to inline functions containing inlines themselves. */
# define ELF_DYNAMIC_RELOCATE(map, lazy, consider_profile, skip_ifunc) \
do { \
int edr_lazy = elf_machine_runtime_setup ((map), (lazy), \
(consider_profile)); \
ELF_DYNAMIC_DO_REL ((map), edr_lazy, skip_ifunc); \
ELF_DYNAMIC_DO_RELA ((map), edr_lazy, skip_ifunc); \
} while (0)
#endif
elf_machine_runtime_setup returns just fine, so I'm assuming that the problem lies with ELF_DYNAMIC_DO_REL. This is the source for the mentioned macro. The problem here is that the called method is inline, so GDB only displays the macro name and not the underlying source.
Using ni in GDB, I see the following after elf_machine_runtime_setup returns:
ELF_DYNAMIC_RELOCATE (l, lazy, consider_profiling, skip_ifunc);
ELF_DYNAMIC_RELOCATE (l, lazy, consider_profiling, skip_ifunc);
ELF_DYNAMIC_RELOCATE (l, lazy, consider_profiling, skip_ifunc);
Stepping through assembly, the segfault happens after the following instruction: movaps %xmm0,-0x70(%rbp).
info local isn't of much help:
(gdb) info local
ranges = {{start = 140072440991568, size = 0, nrelative = 0, lazy = 670467104}, {start = 0, size = 140072438891376, nrelative = 140072441065920,
lazy = 672664367}}
textrels = 0x0
errstring = 0x0
lazy = <optimized out>
skip_ifunc = 0
Interestingly enough, when I use GDB to inject the shared library (using this code I found somewhere on the net), the library loads perfectly.
sudo gdb -n -q -batch \
-ex "attach $pid" \
-ex "set \$dlopen = (void*(*)(char*, int)) dlopen" \
-ex "call \$dlopen(\"$(pwd)/libexample.so\", 1)" \
-ex "detach" \
-ex "quit"
)"
Thanks in advance!
After days of scratching my head and ripping off my hair, I decided to Google "MOVAPS segfault".
MOVAPS is a SIMD instruction (and here, it is used to quickly zero out a quadword). Here's some more info about the same.
On taking a closer look, I noticed the following paragraph:
When the source or destination operand is a memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) is generated.
Hmm. So I read the value of the offending address.
(gdb) print $rbp - 0x70
$2 = (void *) 0x7ffecd32e838
There. The address isn't aligned to a 16-byte boundary and thus the segfault occurs.
Fixing this was easy.
; Create a new stack frame
push rbp
sub rsp, 0x8
; Do stuff
; Fix the stack pointer
add rsp, 0x8
; Destroy stack frame, return, etc.
I'm still doubtful if this is the right way to do it, but it works.
Oh, and GDB got it right the whole time - it made sure that the stack was aligned.
Related
Background
I've built qemu-system-x86_64.exe on a Windows machine using MSYS2 (x86_64), and I'm debugging a segmentation fault that happens when I try to run it.
Actually I don't think the problem is related to either QEMU or MSYS2, it's a problem of debugging segmentation fault and possibly wrong code generation.
Debugging the Segmentation Fault
The program crashes with segmentation fault error right at the beginning.
When running with gdb, I found out the following:
Starting program: C:\msys64\home\Administrator\qemu\x86_64-softmmu\qemu-system-x86_64.exe
[New Thread 4656.0x1194]
Program received signal SIGSEGV, Segmentation fault.
0x00000000007d3254 in getpagesize () at util/oslib-win32.c:535
535 {
(gdb) bt
#0 0x00000000007d3254 in getpagesize () at util/oslib-win32.c:535
#1 0x000000000086dd39 in init_real_host_page_size () at util/pagesize.c:16
#2 0x00000000007ea1b2 in __do_global_ctors ()
at C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/gccmain.c:67
#3 0x00000000007ea20f in __main ()
at C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/gccmain.c:83
#4 0x000000000040137f in __tmainCRTStartup ()
at C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:329
#5 0x00000000004014db in WinMainCRTStartup ()
at C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:195
This is strange.
The program crashes when running __do_global_ctors and calling init_real_host_page_size() which calls getpagesize(). These are really simple functions:
uintptr_t qemu_real_host_page_size;
intptr_t qemu_real_host_page_mask;
static void __attribute__((constructor)) init_real_host_page_size(void)
{
qemu_real_host_page_size = getpagesize();
qemu_real_host_page_mask = -(intptr_t)qemu_real_host_page_size;
}
...
int getpagesize(void)
{
SYSTEM_INFO system_info;
GetSystemInfo(&system_info);
return system_info.dwPageSize;
}
getpagesize() crashes right at the beginning of the function, before it even calls GetSystemInfo.
Here is the disassembly of that code fragment and register values:
(gdb) disassem
Dump of assembler code for function getpagesize:
0x00000000007d3250 <+0>: sub $0x68,%rsp
=> 0x00000000007d3254 <+4>: mov %fs:0x0,%rax
0x00000000007d325d <+13>: mov %rax,0x58(%rsp)
0x00000000007d3262 <+18>: xor %eax,%eax
0x00000000007d3264 <+20>: lea 0x20(%rsp),%rcx
0x00000000007d3269 <+25>: callq *0x68e8b9(%rip) # 0xe61b28 <__imp_GetSystemInfo>
0x00000000007d326f <+31>: mov 0x24(%rsp),%eax
0x00000000007d3273 <+35>: mov 0x58(%rsp),%rdx
0x00000000007d3278 <+40>: xor %fs:0x0,%rdx
0x00000000007d3281 <+49>: jne 0x7d3288 <getpagesize+56>
0x00000000007d3283 <+51>: add $0x68,%rsp
0x00000000007d3287 <+55>: retq
0x00000000007d3288 <+56>: callq 0x85bde0 <__stack_chk_fail>
0x00000000007d328d <+61>: nop
End of assembler dump.
(gdb) info registers
rax 0x6f4b868 116701288
rbx 0x86ec10 8842256
rcx 0x6f4b8b8 116701368
rdx 0xe5a780 15050624
rsi 0x86e220 8839712
rdi 0x6f4ad50 116698448
rbp 0x6f4ad10 0x6f4ad10
rsp 0x22fd80 0x22fd80
r8 0x0 0
r9 0x0 0
r10 0x5000016b 1342177643
r11 0x22f9d8 2292184
r12 0x0 0
r13 0x10 16
r14 0x0 0
r15 0x0 0
rip 0x7d3254 0x7d3254 <getpagesize+4>
eflags 0x10202 [ IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x53 83
gs 0x2b 43
It looks like something is wrong with the memory access mov %fs:0x0,%rax.
Who sets FS to 83?
(gdb) starti
Starting program: C:\msys64\home\Administrator\qemu\x86_64-softmmu\qemu-system-x86_64.exe
[New Thread 3508.0x14b0]
Program stopped.
0x00000000778b6fb1 in ntdll!CsrSetPriorityClass ()
from C:\Windows\SYSTEM32\ntdll.dll
(gdb) p $fs
$1 = 83
(gdb) watch $fs
Watchpoint 1: $fs
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00000000007d3254 in getpagesize () at util/oslib-win32.c:535
535 {
No one sets FS!
Questions
GCC generated code that uses uninitialized register. What could cause that? Was there some initialization code that should have run but didn't?
Any ideas how can I further debug this issue?
FS is an x86 segment register. These are generally not set by the user program, but instead set by the OS or by the runtime libraries, for various special purposes. For instance on Windows x86-64 GS is used to point to a per-thread data block: https://en.wikipedia.org/wiki/Win32_Thread_Information_Block (and FS is not used).
In this case the problem is a bug in the GCC 8 compiler you are using: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86832
In some situations this compiler generates code that assumes FS has been set up for "native TLS", which is wrong because MINGW does not support "native TLS" and FS is not set to anything useful.
The workaround is to avoid compiling with the -fstack-protector-strong compiler option. For QEMU you can do that by passing configure the flag --disable-stack-protector.
(PS: if you want to know how I identified the cause of this segfault: I googled for 'qemu-devel sigsegv getpagesize', which brings up a mailing list thread where somebody else ran into and reported the bug, the problem was diagnosed and a link to the GCC bug found.)
There's a way to break on a variable/memory address being freed ? Does a simple watchpoint work on this case ?
Scenario, I getting a segfault when the program does the freeing of variables apparently the variable in question is getting free two times, i need know where it's first freed.
PS: lldb and gdb, if possible for both respective commands.
How to do it in gdb
In gdb you can set conditional breakpoints by writing break location if condition.
This means that in order to break on free where the function is invoked with a pointer that refers to some dynamically allocated data, we first have to obtain the address of the dynamically allocated data, and then set a breakpoint with a suitable condition.
Further reading:
sourceware.org :: Conditions - Debugging with gdb
fayewilliams.com - GDB Conditional Breakpoints
Setting up the experiment
/tmp% cat > foo.c <<EOF
> #include <stdlib.h>
>
> void some_function (int * p) {
> free (p);
> }
>
> int main () {
> int * p = malloc (sizeof (int));
>
> some_function (p);
>
> free (p);
>
> return 0;
> }
> EOF
/tmp% gcc -g foo.c -o a.out
The Adventure
/tmp% gdb ./a.out
The first thing we need to do is find a suitable breakpoint so that we can expect the contents of the variable we would like to watch, more specifically the address of the dynamically allocated memory.
(gdb) list main
2
3 void some_function (int * p) {
4 free (p);
5 }
6
7 int main () {
8 int * p = malloc (sizeof (int));
9
10 some_function (p);
11
We will then set a breakpoint on a suitable place, in this case line 9 — we then run the application to see what the value stored in p is.
(gdb) break 9
Breakpoint 1 at 0x400577: file foo.c, line 9.
(gdb) run
Starting program: /tmp/a.out
Breakpoint 1, main () at foo.c:10
(gdb) print p
$1 = (int *) 0x601010
When we know the address we would like to monitor in terms of free, we can easily set a conditional breakpoint at our desired location. First we will need to make sure that free is actually named what we think.
(gdb) disas main
Dump of assembler code for function main:
0x0000000000400561 : push %rbp
0x0000000000400562 : mov %rsp,%rbp
0x0000000000400565 : sub $0x10,%rsp
0x0000000000400569 : mov $0x4,%edi
0x000000000040056e : callq 0x400440 <malloc#plt>
0x0000000000400573 : mov %rax,-0x8(%rbp)
=> 0x0000000000400577 : mov -0x8(%rbp),%rax
0x000000000040057b : mov %rax,%rdi
0x000000000040057e : callq 0x400546 <some_function>
0x0000000000400583 : mov -0x8(%rbp),%rax
0x0000000000400587 : mov %rax,%rdi
0x000000000040058a : callq 0x400410 <free#plt>
0x000000000040058f : mov $0x0,%eax
0x0000000000400594 : leaveq
0x0000000000400595 : retq
We can now create the breakpoint, and continue execution to see where our data is freed:
(gdb) break free#plt if $rdi == 0x601010
Breakpoint 2 at 0x400410 (3 locations)
(gdb) cont
Continuing.
Breakpoint 2, 0x0000000000400410 in free#plt ()
(gdb) backtrace
#0 0x0000000000400410 in free#plt ()
#1 0x000000000040055e in some_function (p=0x601010) at foo.c:4
#2 0x0000000000400583 in main () at foo.c:10
(gdb) cont
Continuing.
Breakpoint 2, 0x0000000000400410 in free#plt ()
(gdb) backtrace
#0 0x0000000000400410 in free#plt ()
#1 0x000000000040058f in main () at foo.c:12
(gdb) cont
Continuing.
*** Error in `/tmp/a.out': double free or corruption (fasttop): 0x0000000000601010 ***
...
I'd like to create a complete instruction trace of the execution of a program, to collect some stats etc. I first tried using linux' ptrace functionality to step through a program (using the tutorial here). This creates two processes, the traced one and the debugger, and they communicate via signals. I only got around 16K instructions per second (on 1.6GHz Atom), so this is too slow for anything non-trivial.
I thought the interprocess communication via signals is too slow, so I tried setting up the debugging in the same process as the execution: Set the trap flag, and create a signal handler. When a software interrupt is used to make a syscall, the trap flag should be saved, the kernel would use it's own flags - so I thought. But my program somehow gets killed by signal SIGTRAP.
This is what I set up:
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
int cycle = 0;
void trapHandler(int signum) {
if (cycle % 262144 == 0) {
write(STDOUT_FILENO," trap\n",6);
}
cycle += 1;
}
void startTrace() {
// set up signal handler
signal(SIGTRAP, trapHandler);
// set trap flag
asm volatile("pushfl\n"
"orl $0x100, (%esp)\n"
"popfl\n"
);
}
void printRock() {
char* s = "Rock\n";
asm(
"movl $5, %%edx\n" // message length
"movl %0, %%ecx\n" // message to write
"movl $1, %%ebx\n" // file descriptor (stdout)
"movl $4, %%eax\n" // system call number (sys_write)
"int $0x80\n" // sycall
: // no output regs
: "r"(s) // input text
: "edx","ecx","ebx","eax"
);
}
int main() {
startTrace();
// some computation
int x = 0;
int i;
for (i = 0; i < 100000; i++) {
x += i*2;
}
printRock();
write(STDOUT_FILENO,"Paper\n",6);
write(STDOUT_FILENO,"Scissors\n",9);
}
When running, this gives:
trap
trap
trap
Rock
Paper
trap
Trace/breakpoint trap (core dumped)
So now we get about 250K instructions per second, still slow but non-trivial executions are possible. But there is that core dump that appears to happen between the two write calls. In GDB, we see where it happens:
Dump of assembler code for function __kernel_vsyscall:
0xb76f3414 <+0>: push %ecx
0xb76f3415 <+1>: push %edx
0xb76f3416 <+2>: push %ebp
0xb76f3417 <+3>: mov %esp,%ebp
0xb76f3419 <+5>: sysenter
0xb76f341b <+7>: nop
0xb76f341c <+8>: nop
0xb76f341d <+9>: nop
0xb76f341e <+10>: nop
0xb76f341f <+11>: nop
0xb76f3420 <+12>: nop
0xb76f3421 <+13>: nop
0xb76f3422 <+14>: int $0x80
=> 0xb76f3424 <+16>: pop %ebp
0xb76f3425 <+17>: pop %edx
0xb76f3426 <+18>: pop %ecx
0xb76f3427 <+19>: ret
And the backtrace:
Program terminated with signal SIGTRAP, Trace/breakpoint trap.
#0 0xb77c5424 in __kernel_vsyscall ()
#1 0xb76d0553 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:81
#2 0x0804847d in trapHandler (signum=5) at count.c:8
#3 <signal handler called>
#4 0xb77c5424 in __kernel_vsyscall ()
#5 0xb76d0553 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:81
#6 0x08048537 in main () at count.c:49
It appears syscalls that happen via int 80 are fine, but the write calls use the kernel's VIDSO/vsyscall break somehow (I didn't know about this functionality, closer described here). It may be related to using sysenter rather than int 80, maybe the trap flag survives when stepping into the kernel. I don't quite get what's going with the recursive __kernel_vsyscall calls. I also don't get why there's an int 80 call inside the __kernel_vsyscall function.
Does anybody have a suggestion what's going on, and how to fix this? Maybe it's possible to disable the VDSO/vsysicall? Or is it possible to override the __kernel_vsyscall function with one that uses int 80 rather than sysenter?
Answering own question.
I didn't figure out what was happening or explain it in detail, but I found a workaround: disable VDSO. That can be done via
sudo sysctl vm.vdso_enabled=0
With this, this whole single stepping through a program works, including stepping across system calls. Disclaimer: don't blame me if things go bad.
EDIT: After updating my Linux (32-bit x86) much later, this error doesn't occur anymore. Maybe it was a bug that was fixed.
I am using NASM on linux to write a basic assembly program that calls a function from the C libraries (printf). Unfortunately, I am incurring a segmentation fault while doing so. Commenting out the call to printf allows the program to run without error.
; Build using these commands:
; nasm -f elf64 -g -F stabs <filename>.asm
; gcc <filename>.o -o <filename>
;
SECTION .bss ; Section containing uninitialized data
SECTION .data ; Section containing initialized data
text db "hello world",10 ;
SECTION .text ; Section containing code
global main
extern printf
;-------------
;MAIN PROGRAM BEGINS HERE
;-------------
main:
push rbp
mov rbp,rsp
push rbx
push rsi
push rdi ;preserve registers
****************
;code i wish to execute
push text ;pushing address of text on to the stack
;x86-64 uses registers for first 6 args, thus should have been:
;mov rdi,text (place address of text in rdi)
;mov rax,0 (place a terminating byte at end of rdi)
call printf ;calling printf from c-libraries
add rsp,8 ;reseting the stack to pre "push text"
**************
pop rdi ;preserve registers
pop rsi
pop rbx
mov rsp,rbp
pop rbp
ret
x86_64 does not use the stack for the first 6 args. You need to load them in the proper registers. Those are:
rdi, rsi, rdx, rcx, r8, r9
The trick I use to remember the first two is to imagine the function is memcpy implemented as rep movsb,
You're calling a varargs function -- printf expects a variable number of arguments and you have to account for that in the argument stack. See here: http://www.csee.umbc.edu/portal/help/nasm/sample.shtml#printf1
I'm using libcurl in my program, and running into a segfault. Before I filed a bug with the curl project, I thought I'd do a little debugging. What I found seemed very odd to me, and I haven't been able to make sense of it yet.
First, the segfault traceback:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe77f6700 (LWP 592)]
0x00007ffff6a2ea5c in memcpy () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007ffff6a2ea5c in memcpy () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff5bc29e5 in x509_name_oneline (a=0x7fffe3d9c3c0,
buf=0x7fffe77f4ec0 "C=US; O=The Go Daddy Group, Inc.; OU=Go Daddy Class 2 Certification Authority\375\034<M_r\206\233\261\310\340\371\023.Jg\205\244\304\325\347\372\016#9Ph%", size=255) at ssluse.c:629
#2 0x00007ffff5bc2a6f in cert_verify_callback (ok=1, ctx=0x7fffe77f50b0)
at ssluse.c:645
#3 0x00007ffff72c9a80 in ?? () from /lib/libcrypto.so.0.9.8
#4 0x00007ffff72ca430 in X509_verify_cert () from /lib/libcrypto.so.0.9.8
#5 0x00007ffff759af58 in ssl_verify_cert_chain () from /lib/libssl.so.0.9.8
#6 0x00007ffff75809f3 in ssl3_get_server_certificate ()
from /lib/libssl.so.0.9.8
#7 0x00007ffff7583e50 in ssl3_connect () from /lib/libssl.so.0.9.8
#8 0x00007ffff5bc48f0 in ossl_connect_step2 (conn=0x7fffe315e9a8, sockindex=0)
at ssluse.c:1724
#9 0x00007ffff5bc700f in ossl_connect_common (conn=0x7fffe315e9a8,
sockindex=0, nonblocking=false, done=0x7fffe77f543f) at ssluse.c:2498
#10 0x00007ffff5bc7172 in Curl_ossl_connect (conn=0x7fffe315e9a8, sockindex=0)
at ssluse.c:2544
#11 0x00007ffff5ba76b9 in Curl_ssl_connect (conn=0x7fffe315e9a8, sockindex=0)
...
The call to memcpy looks like this:
memcpy(buf, biomem->data, size);
(gdb) p buf
$46 = 0x7fffe77f4ec0 "C=US; O=The Go Daddy Group, Inc.; OU=Go Daddy Class 2 Certification Authority\375\034<M_r\206\233\261\310\340\371\023.Jg\205\244\304\325\347\372\016#9Ph%"
(gdb) p biomem->data
$47 = 0x7fffe3e1ef60 "C=US; O=The Go Daddy Group, Inc.; OU=Go Daddy Class 2 Certification Authority\375\034<M_r\206\233\261\310\340\371\023.Jg\205\244\304\325\347\372\016#9Ph%"
(gdb) p size
$48 = 255
If I go up a frame, I see that the pointer passed in for buf came from a local variable defined in the calling function:
char buf[256];
Here's where it starts to get weird. I can manually inspect all 256 bytes of both buf and biomem->data without gdb complaining that the memory isn't accesible. I can also manually write all 256 bytes of buf using the gdb set command, without any error. So if all the memory involved is readable and writable, why does memcpy fail?
Also interesting is that I can use gdb to manually call memcpy with the pointers involved. As long as I pass a size <= 160, it runs without a problem. As soon as I pass 161 or higher, gdb gets a sigsegv. I know buf is larger than 160, because it was created on the stack as an array of 256. biomem->data is a little harder to figure, but I can read well past byte 160 with gdb.
I should also mention that this function (or rather the curl method I call that leads to this) completes successfully many times before the crash. My program uses curl to repeatedly call a web service API while it runs. It calls the API every five seconds or so, and runs for about 14 hours before it crashes. It's possible that something else in my app is writing out of bounds and stomping on something that creates the error condition. But it seems suspicious that it crashes at exactly the same point every time, although the timing varies. And all the pointers seem ok in gdb, but memcpy still fails. Valgrind doesn't find any bounds errors, but I haven't let my program run with valgrind for 14 hours.
Within memcpy itself, the disassembly looks like this:
(gdb) x/20i $rip-10
0x7ffff6a2ea52 <memcpy+242>: jbe 0x7ffff6a2ea74 <memcpy+276>
0x7ffff6a2ea54 <memcpy+244>: lea 0x20(%rdi),%rdi
0x7ffff6a2ea58 <memcpy+248>: je 0x7ffff6a2ea90 <memcpy+304>
0x7ffff6a2ea5a <memcpy+250>: dec %ecx
=> 0x7ffff6a2ea5c <memcpy+252>: mov (%rsi),%rax
0x7ffff6a2ea5f <memcpy+255>: mov 0x8(%rsi),%r8
0x7ffff6a2ea63 <memcpy+259>: mov 0x10(%rsi),%r9
0x7ffff6a2ea67 <memcpy+263>: mov 0x18(%rsi),%r10
0x7ffff6a2ea6b <memcpy+267>: mov %rax,(%rdi)
0x7ffff6a2ea6e <memcpy+270>: mov %r8,0x8(%rdi)
0x7ffff6a2ea72 <memcpy+274>: mov %r9,0x10(%rdi)
0x7ffff6a2ea76 <memcpy+278>: mov %r10,0x18(%rdi)
0x7ffff6a2ea7a <memcpy+282>: lea 0x20(%rsi),%rsi
0x7ffff6a2ea7e <memcpy+286>: lea 0x20(%rdi),%rdi
0x7ffff6a2ea82 <memcpy+290>: jne 0x7ffff6a2ea30 <memcpy+208>
0x7ffff6a2ea84 <memcpy+292>: data32 data32 nopw %cs:0x0(%rax,%rax,1)
0x7ffff6a2ea90 <memcpy+304>: and $0x1f,%edx
0x7ffff6a2ea93 <memcpy+307>: mov -0x8(%rsp),%rax
0x7ffff6a2ea98 <memcpy+312>: jne 0x7ffff6a2e969 <memcpy+9>
0x7ffff6a2ea9e <memcpy+318>: repz retq
(gdb) info registers
rax 0x0 0
rbx 0x7fffe77f50b0 140737077268656
rcx 0x1 1
rdx 0xff 255
rsi 0x7fffe3e1f000 140737016623104
rdi 0x7fffe77f4f60 140737077268320
rbp 0x7fffe77f4e90 0x7fffe77f4e90
rsp 0x7fffe77f4e48 0x7fffe77f4e48
r8 0x11 17
r9 0x10 16
r10 0x1 1
r11 0x7ffff6a28f7a 140737331236730
r12 0x7fffe3dde490 140737016358032
r13 0x7ffff5bc2a0c 140737316137484
r14 0x7fffe3d69b50 140737015880528
r15 0x0 0
rip 0x7ffff6a2ea5c 0x7ffff6a2ea5c <memcpy+252>
eflags 0x10203 [ CF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
(gdb) p/x $rsi
$50 = 0x7fffe3e1f000
(gdb) x/20x $rsi
0x7fffe3e1f000: 0x00000000 0x00000000 0x00000000 0x00000000
0x7fffe3e1f010: 0x00000000 0x00000000 0x00000000 0x00000000
0x7fffe3e1f020: 0x00000000 0x00000000 0x00000000 0x00000000
0x7fffe3e1f030: 0x00000000 0x00000000 0x00000000 0x00000000
0x7fffe3e1f040: 0x00000000 0x00000000 0x00000000 0x00000000
I'm using libcurl version 7.21.6, c-ares version 1.7.4, and openssl version 1.0.0d. My program is multithreaded, but I have registered mutex callbacks with openssl. The program is running on Ubuntu 11.04 desktop, 64-bit. libc is 2.13.
Clearly libcurl is over-reading the source buffer, and stepping into unreadable memory (page at 0x7fffe3e1f000 -- you can confirm that memory is unreadable by looking at /proc/<pid>/maps for the program being debugged).
Here's where it starts to get weird. I can manually inspect all 256 bytes of both
buf and biomem->data without gdb complaining that the memory isn't accesible.
There is a well-known Linux kernel flaw: even for memory that has PROT_NONE (and causes SIGSEGV on attempt to read it from the process itself), attempt by GDB to ptrace(PEEK_DATA,...) succeeds. That explains why you can examine 256 bytes of the source buffer in GDB, even though only 96 of them are actually accessible.
Try running your program under Valgrind, chances are it will tell you that you are memcpying into heap-allocated buffer that is too small.
Do you any possibility of creating a "crumple zone"?
That is, deliberately increasing the size of the two buffers, or in the case of the structure putting an extra unused element after the destination?
You then seed the source crumple with something such as "0xDEADBEEF", and the destination with som with something nice. If the destination every changes you've got something to work with.
256 is a bit suggestive, any possibility it could somehow be being treated as signed quantity, becoming -1, and hence very big? Can't see how gdb wouldn't show it, but ...