Is mmap a built in function? - c

I know that mmap is a system call, but there must be some wrapper in glibc that does the system call. Yet when I try to use gdb to step through the mmap function in my program, gdb ignores it as it can't find any source file for it (Note I compile my own glibc from source). I can step through other glibc library functions such as printf and malloc but not mmap. I also use the flag -fno-builtin so that gcc doesn't use built in functions. Any help on this will be greatly appreciated.

I don't know what your problem is. It works perfectly fine for me.
Using system libc.so.6, with debug symbols installed:
// mmap.c
#include <sys/mman.h>
int main()
{
void *p = mmap(0, 4096, PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
return 0;
}
gcc -g mmap.c
$ gdb -q a.out
Reading symbols from /tmp/a.out...done.
(gdb) start
Temporary breakpoint 1 at 0x40052c: file mmap.c, line 5.
Temporary breakpoint 1, main () at mmap.c:5
5 void *p = mmap(0, 4096, PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
(gdb) step
mmap64 () at ../sysdeps/unix/syscall-template.S:82
82 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb)
mmap64 () at ../sysdeps/unix/syscall-template.S:83
83 in ../sysdeps/unix/syscall-template.S
(gdb)
main () at mmap.c:6
6 return 0;
(gdb) q
Using my own glibc build:
gdb -q a.out
Reading symbols from /tmp/a.out...done.
(gdb) start
Temporary breakpoint 1 at 0x40056c: file mmap.c, line 5.
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
Temporary breakpoint 1, main () at mmap.c:5
5 void *p = mmap(0, 4096, PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
(gdb) step
mmap64 () at ../sysdeps/unix/syscall-template.S:81
81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb)
mmap64 () at ../sysdeps/unix/syscall-template.S:82
82 ret
(gdb)
main () at mmap.c:6
6 return 0;
(gdb) q

Related

gdb does not find libc6-dbg info for resolve lib

I am debugging busybox on WSL, for which I downloaded the busybox source code via apt source busybox, and compiled it with debug symbols via make defconfig, make menuconfig (for activating debug build), make.
For libc debug symbols I installed libc6-dgb; while the debug symbols seem to be loaded properly, I still struggle with symbols for resolve/ns_parse.c, which I expected to be provided by libc6-dbg.
How do I get gdb to find the debug symbols for ./resolv/ns_parse.c from (g)libc?
$ gdb --args ./busybox_unstripped nslookup management.azure.com
GNU gdb (Ubuntu 12.0.90-0ubuntu1) 12.0.90
[...]
Reading symbols from ./busybox_unstripped...
(gdb) set verbose on
(gdb) start
Reading in symbols for libbb/appletlib.c...
Temporary breakpoint 1 at 0x11b15: file libbb/appletlib.c, line 1034.
Starting program: /tmp/tmp.tJK9POgK2R/busybox-1.30.1/busybox_unstripped nslookup management.azure.com
Using PIE (Position Independent Executable) displacement 0x555555554000 for "/tmp/tmp.tJK9POgK2R/busybox-1.30.1/busybox_unstripped".
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /usr/lib/debug/.build-id/61/ef896a699bb1c2e4e231642b2e1688b2f1a61e.debug...
Reading symbols from system-supplied DSO at 0x7ffff7fc2000...
(No debugging symbols found in system-supplied DSO at 0x7ffff7fc2000)
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...
Reading symbols from /usr/lib/debug/.build-id/27/e82301dba6c3f644404d504e1bb1c97894b433.debug...
Reading symbols from /lib/x86_64-linux-gnu/libresolv.so.2...
Reading symbols from /usr/lib/debug/.build-id/7f/d7253c61aa6fce2b7e13851c15afa14a5ab160.debug...
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
Reading symbols from /usr/lib/debug/.build-id/69/389d485a9793dbe873f0ea2c93e02efaa9aa3d.debug...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Temporary breakpoint 1, main (Reading in symbols for ../sysdeps/x86/libc-start.c...
argc=3, argv=0x7fffffffe248) at libbb/appletlib.c:1034
1034 {
(gdb) break nslookup.c:336
Reading in symbols for networking/nslookup.c...
Breakpoint 2 at 0x555555592df0: file networking/nslookup.c, line 336.
(gdb) c
Continuing.
Server: 172.20.48.1
Address: 172.20.48.1:53
Breakpoint 2, parse_reply (len=<optimized out>, msg=0x7fffffffd967 "\335y\201") at networking/nslookup.c:348
348 if (!header->aa)
(gdb) n
349 printf("Non-authoritative answer:\n");
(gdb) n
Reading in symbols for ioputs.c...
Non-authoritative answer:
351 if (ns_initparse(msg, len, &handle) != 0) {
(gdb) s
Reading in symbols for ns_parse.c...
__GI_ns_initparse (msg=msg#entry=0x7fffffffd967 "\335y\201", msglen=msglen#entry=512, handle=handle#entry=0x7fffffffd4d0) at ./resolv/ns_parse.c:90
90 ./resolv/ns_parse.c: No such file or directory.
Oh, well, it seems to be quite stupid: Sources and debug symbols are different things. :)
After downloading sources via apt source libc6 and adding it to directories sources are now shown...
(gdb) set directories ../busybox-1.30.1/:../glibc-2.35/
(gdb) show directories
Source directories searched: /tmp/tmp.tJK9POgK2R/busybox-1.30.1/../busybox-1.30.1:/tmp/tmp.tJK9POgK2R/busybox-1.30.1/../glibc-2.35:$cdir:$cwd
(gdb) list
85 return (ptr - optr);
86 }
87 libresolv_hidden_def (ns_skiprr)
88
89 int
90 ns_initparse(const u_char *msg, int msglen, ns_msg *handle) {
91 const u_char *eom = msg + msglen;
92 int i;
93
94 memset(handle, 0x5e, sizeof *handle);
(gdb)

Address of Static Variables Changing at Runtime

I'm trying to figure out why the address of a static uint64_t arr[] changes when it's defined in the global scope inside the main executable.
It changes from 0x201060 (defined by the linker?) to 0x555555755060 at runtime, and I have no idea why.
Why does this happen, and is there a way I can prevent this behavior?
I have a precompiled binary that does not exhibit this behavior, and I am trying to emulate it.
$ gdb a.out # compiled from test.c
GNU gdb (GDB) 8.0.1...
Reading symbols from a.out...done.
(gdb) x/x arr
0x201060 <arr>: 0x00000024
(gdb) b main
Breakpoint 1 at 0x6e9: file test.c, line 116.
(gdb) run
Starting program: ...
Breakpoint 1, main (argc=1, argv=0x7fffffffdb28) at test.c:116
116 if(argc != 2) {
(gdb) x/x arr
0x555555755060 <arr>: 0x00000024
test.c was compiled with the following options: -g -fno-stack-protector -z execstack.
I compiled and ran test.c without ASLR (sudo bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space'), but the result was the same.
The relevant parts of test.c are:
#include <stdint.h>
extern int func(uint64_t[]);
static uint64_t arr[] = {
0x00000024, 0x00201060,
0x00201080, 0x00000000,
0x00000008, 0x002010e0,
0x002010a0, 0x00000000,
0x00000032, 0x002010c0,
...
0x00201100, 0x00000000
};
int main(int argc, char** argv) {
func(arr);
return 0;
}
I figured it out :)
It turns out my gcc was outputting PIE executables by default, and passing -no-pie did what I needed. I made the array static in an attempt to keep the address the same, but I suppose that static only keeps the address the same during runtime.
Thank you to Mark Plotnick for your suggestion in the comments!

LD_PRELOAD with possible static shared library functions

My objective is to hook the open function that dlopen on linux uses. For some reason, this code is not hooking dlopen->open, but it does hook my version of open main.c->open. Is dlopen not using my symbols somehow?
Compilation process is as follows:
gcc main.c -ldl -ggdb
gcc fake-open.c -o libexample.so -fPIC -shared
export LD_PRELOAD="$PWD/libexample.so"
When I run the program, everything works. Ensuring the LD_PRELOAD variable is set.. etc.
Here is the problem, when I try to hook the open function directly or indirectly called by dlopen, somehow this "version" of open is not being resolved/redirected/hooked by my version.
[main.c]
#include <dlfcn.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{
puts("calling open");
int fd = open("/tmp/test.so", O_RDONLY|O_CLOEXEC);
puts("calling dlopen");
int *handle = dlopen("/tmp/test.so", RTLD_LAZY);
}
[fake-open.c]
#define _GNU_SOURCE
#include <stdio.h>
#include <dlfcn.h>
#include <sys/types.h>
#include <sys/stat.h>
//#include <fcntl.h>
int open(const char *pathname, int flags)
{
puts("from hooked..");
return 1;
}
Console Output:
calling open
from hooked..
calling dlopen
I know for a fact dlopen is somehow calling open due to strace.
write(1, "calling open\n", 13calling open
) = 13
write(1, "from hooked..\n", 14from hooked..
) = 14
write(1, "calling dlopen\n", 15calling dlopen
) = 15
brk(0) = 0x804b000
brk(0x806c000) = 0x806c000
open("/tmp/test.so", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\2\0\3\0\1\0\0\0`\205\4\0104\0\0\0"..., 512) = 512
But, for some reason, when dlopen calls open, it is not using my version of open. This has to be some kind of linking of run time symbol resolution problem, or perhaps dlopen is using a static version of open and doesnt need to resolve any symbols at run or load time?
First, contrary to #usr's answer, dlopen does open the library.
We can confirm this by running a simple test under GDB:
// main.c
#include <dlfcn.h>
int main()
{
void *h = dlopen("./foo.so", RTLD_LAZY);
return 0;
}
// foo.c; compile with "gcc -fPIC -shared -o foo.so foo.c"
int foo() { return 0; }
Let's compile and run this:
gdb -q ./a.out
(gdb) start
Temporary breakpoint 1 at 0x400605: file main.c, line 4.
Starting program: /tmp/a.out
Temporary breakpoint 1, main () at main.c:4
4 void *h = dlopen("./foo.so", RTLD_LAZY);
(gdb) catch syscall open
Catchpoint 2 (syscall 'open' [2])
(gdb) c
Continuing.
Catchpoint 2 (call to syscall open), 0x00007ffff7df3497 in open64 () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 0x00007ffff7df3497 in open64 () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007ffff7ddf5bd in open_verify (name=0x602010 "./foo.so", fbp=0x7fffffffd568, loader=<optimized out>, whatcode=<optimized out>, found_other_class=0x7fffffffd550, free_name=<optimized out>) at dl-load.c:1930
#2 0x00007ffff7de2d6f in _dl_map_object (loader=loader#entry=0x7ffff7ffe1c8, name=name#entry=0x4006a4 "./foo.so", type=type#entry=2, trace_mode=trace_mode#entry=0, mode=mode#entry=-1879048191, nsid=0) at dl-load.c:2543
#3 0x00007ffff7deea14 in dl_open_worker (a=a#entry=0x7fffffffdae8) at dl-open.c:235
#4 0x00007ffff7de9fc4 in _dl_catch_error (objname=objname#entry=0x7fffffffdad8, errstring=errstring#entry=0x7fffffffdae0, mallocedp=mallocedp#entry=0x7fffffffdad0, operate=operate#entry=0x7ffff7dee960 <dl_open_worker>, args=args#entry=0x7fffffffdae8) at dl-error.c:187
#5 0x00007ffff7dee37b in _dl_open (file=0x4006a4 "./foo.so", mode=-2147483647, caller_dlopen=<optimized out>, nsid=-2, argc=1, argv=0x7fffffffde28, env=0x7fffffffde38) at dl-open.c:661
#6 0x00007ffff7bd702b in dlopen_doit (a=a#entry=0x7fffffffdd00) at dlopen.c:66
#7 0x00007ffff7de9fc4 in _dl_catch_error (objname=0x7ffff7dd9110 <last_result+16>, errstring=0x7ffff7dd9118 <last_result+24>, mallocedp=0x7ffff7dd9108 <last_result+8>, operate=0x7ffff7bd6fd0 <dlopen_doit>, args=0x7fffffffdd00) at dl-error.c:187
#8 0x00007ffff7bd762d in _dlerror_run (operate=operate#entry=0x7ffff7bd6fd0 <dlopen_doit>, args=args#entry=0x7fffffffdd00) at dlerror.c:163
#9 0x00007ffff7bd70c1 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#10 0x0000000000400614 in main () at main.c:4
This tells you that on 64-bit system, dlopen calls open64 instead of open, so your interposer wouldn't work (you'd need to interpose open64 instead).
But you are on a 32-bit system (as evidenced by the 0x806c000 etc. addresses printed by strace), and there the stack trace looks like this:
#0 0xf7ff3774 in open () at ../sysdeps/unix/syscall-template.S:81
#1 0xf7fe1211 in open_verify (name=0x804b008 "./foo.so", fbp=fbp#entry=0xffffc93c, loader=0xf7ffd938, whatcode=whatcode#entry=0, found_other_class=found_other_class#entry=0xffffc933, free_name=free_name#entry=true) at dl-load.c:1930
#2 0xf7fe4114 in _dl_map_object (loader=loader#entry=0xf7ffd938, name=name#entry=0x8048590 "./foo.so", type=type#entry=2, trace_mode=trace_mode#entry=0, mode=mode#entry=-1879048191, nsid=0) at dl-load.c:2543
#3 0xf7feec14 in dl_open_worker (a=0xffffccdc) at dl-open.c:235
#4 0xf7feac06 in _dl_catch_error (objname=objname#entry=0xffffccd4, errstring=errstring#entry=0xffffccd8, mallocedp=mallocedp#entry=0xffffccd3, operate=operate#entry=0xf7feeb50 <dl_open_worker>, args=args#entry=0xffffccdc) at dl-error.c:187
#5 0xf7fee644 in _dl_open (file=0x8048590 "./foo.so", mode=-2147483647, caller_dlopen=0x80484ea <main+29>, nsid=<optimized out>, argc=1, argv=0xffffcf74, env=0xffffcf7c) at dl-open.c:661
#6 0xf7fafcbc in dlopen_doit (a=0xffffce90) at dlopen.c:66
#7 0xf7feac06 in _dl_catch_error (objname=0xf7fb3070 <last_result+12>, errstring=0xf7fb3074 <last_result+16>, mallocedp=0xf7fb306c <last_result+8>, operate=0xf7fafc30 <dlopen_doit>, args=0xffffce90) at dl-error.c:187
#8 0xf7fb037c in _dlerror_run (operate=operate#entry=0xf7fafc30 <dlopen_doit>, args=args#entry=0xffffce90) at dlerror.c:163
#9 0xf7fafd71 in __dlopen (file=0x8048590 "./foo.so", mode=1) at dlopen.c:87
#10 0x080484ea in main () at main.c:4
So why isn't open_verifys call to open redirected to your open interposer?
First, let's look at the actual call instruction in frame 1:
(gdb) fr 1
#1 0xf7fe1211 in open_verify (name=0x804b008 "./foo.so", fbp=fbp#entry=0xffffc93c, loader=0xf7ffd938, whatcode=whatcode#entry=0, found_other_class=found_other_class#entry=0xffffc933, free_name=free_name#entry=true) at dl-load.c:1930
1930 dl-load.c: No such file or directory.
(gdb) x/i $pc-5
0xf7fe120c <open_verify+60>: call 0xf7ff3760 <open>
Compare this to the call instruction in frame 10:
(gdb) fr 10
#10 0x080484ea in main () at main.c:4
4 void *h = dlopen("./foo.so", RTLD_LAZY);
(gdb) x/i $pc-5
0x80484e5 <main+24>: call 0x80483c0 <dlopen#plt>
Notice anything different?
That's right: the call from main goes through the procedure linkage table (PLT), which the dynamic loader (ld-linux.so.2) resolves to appropriate definition.
But the call to open in frame 1 does not go through PLT (and thus is not interposable).
Why is that? Because that call must work before there is any other definition of open available -- it is used while the libc.so.6 (which normally supplies the definition of open) is itself being loaded (by the dynamic loader).
For this reason, the dynamic loader must be entirely self-contained (in fact in contains a statically linked in copy of a subset of libc).
My objective is to hook the open function that dlopen on linux uses.
For the reason above, this objective can't be achieved via LD_PRELOAD. You'll need to use some other hooking mechanism, such as patching the loader executable code at runtime.

Why does calling calloc in gdb not appear to zero out the memory?

I'm doing some experimentation with editing a process memory while it's running, and I noticed when I call calloc in a gdb'd process, the call seems to work and return the original passed pointer, but the memory does not appear to be initialized to 0:
(gdb) call calloc(1, 32)
$88 = (void *) 0x8d9d50
(gdb) x/8xw 0x8d9d50
0x8d9d50: 0xf74a87d8 0x00007fff 0xf74a87d8 0x00007fff
0x8d9d60: 0xfbfbfbfb 0xfbfbfbfb 0x00000000 0x9b510000
If I call memset on the resulting pointer, however, the initialization works just fine:
(gdb) call memset(0x8d9d50, 0, 32)
$89 = 9280848
(gdb) x/8xw 0x8d9d50
0x8d9d50: 0x00000000 0x00000000 0x00000000 0x00000000
0x8d9d60: 0x00000000 0x00000000 0x00000000 0x00000000
Interesting question. The answer is: on Linux (where I assume you ran your program) this:
(gdb) call calloc(1, 32)
doesn't call calloc from libc.so.6.
Rather it calls calloc from ld-linux.so.2. And that calloc is very minimal. It expects to be called only from ld-linux.so.2 itself, and it assumes that whatever pages it has access to are "clean" and do not require a memset. (That said, I could not reproduce the unclean calloc using glibc-2.19).
You can confirm this like so:
#include <stdlib.h>
int main()
{
void *p = calloc(1, 10);
return p == 0;
}
gcc -g foo.c -m32 && gdb -q ./a.out
Reading symbols from ./a.out...done.
(gdb) start
Temporary breakpoint 1 at 0x8048426: file foo.c, line 4.
Starting program: /tmp/a.out
Temporary breakpoint 1, main () at foo.c:4
warning: Source file is more recent than executable.
4 void *p = calloc(1, 10);
(gdb) b __libc_calloc
Breakpoint 2 at 0xf7e845a0
(gdb) n
Breakpoint 2, 0xf7e845a0 in calloc () from /lib32/libc.so.6
(gdb) fin
Run till exit from #0 0xf7e845a0 in calloc () from /lib32/libc.so.6
0x0804843a in main () at foo.c:4
4 void *p = calloc(1, 10);
Note how the call from the program to calloc hit breakpoint #2.
(gdb) n
5 return p == 0;
(gdb) call calloc(1,32)
$1 = 134524952
Note that above call from GDB did not hit breakpoint #2.
Let's try again with:
(gdb) info func calloc
All functions matching regular expression "calloc":
Non-debugging symbols:
0x08048310 calloc#plt
0xf7fdc820 calloc#plt
0xf7ff16a0 calloc
0xf7e25450 calloc#plt
0xf7e845a0 __libc_calloc
0xf7e845a0 calloc
(gdb) info sym 0xf7ff16a0
calloc in section .text of /lib/ld-linux.so.2 ## this is the wrong one!
(gdb) break *0xf7ff16a0
Breakpoint 3, 0xf7ff16a0 in calloc () from /lib/ld-linux.so.2
(gdb) disable
(gdb) start
Temporary breakpoint 7 at 0x8048426: file foo.c, line 4.
Starting program: /tmp/a.out
Temporary breakpoint 7, main () at foo.c:4
4 void *p = calloc(1, 10);
(gdb) ena 3
(gdb) n
5 return p == 0;
Note that breakpoint #3 did not fire above (because the "real" __libc_calloc was called).
(gdb) call calloc(1,32)
Breakpoint 3, 0xf7ff16a0 in calloc () from /lib/ld-linux.so.2
The program being debugged stopped while in a function called from GDB.
Evaluation of the expression containing the function
(calloc) will be abandoned.
When the function is done executing, GDB will silently stop.
(gdb) bt
#0 0xf7ff16a0 in calloc () from /lib/ld-linux.so.2
#1 <function called from gdb>
#2 main () at foo.c:5
QED.
Update:
I don't see the ld-linux version in the output of "info func calloc"
I think what you see in info func depends on whether you have debug symbols installed. For a (64-bit) glibc with debug symbols, here is what I see:
(gdb) info func calloc
All functions matching regular expression "calloc":
File dl-minimal.c:
void *calloc(size_t, size_t); <<< this is the wrong one!
File malloc.c:
void *__libc_calloc(size_t, size_t); <<< this is the one you want!
Non-debugging symbols:
0x0000000000400440 calloc#plt
0x00007ffff7ddaab0 calloc#plt
0x00007ffff7a344e0 calloc#plt
Here is another way to figure out what calloc GDB thinks it should be calling:
(gdb) start
Temporary breakpoint 1 at 0x8048426: file foo.c, line 4.
Starting program: /tmp/a.out
Temporary breakpoint 1, main () at foo.c:4
warning: Source file is more recent than executable.
4 void *p = calloc(1, 10);
(gdb) p &calloc
$1 = (<text variable, no debug info> *) 0xf7ff16a0 <calloc>
(gdb) info sym 0xf7ff16a0
calloc in section .text of /lib/ld-linux.so.2
Or, for completness, using 64-bit glibc with debug symbols:
(gdb) start
Temporary breakpoint 1 at 0x400555: file foo.c, line 4.
Starting program: /tmp/a.out
Temporary breakpoint 1, main () at foo.c:4
4 void *p = calloc(1, 10);
(gdb) p &calloc
$1 = (void *(*)(size_t, size_t)) 0x7ffff7df1bc0 <calloc>
(gdb) info sym 0x7ffff7df1bc0
calloc in section .text of /lib64/ld-linux-x86-64.so.2

How to use regexec with memory mapped files?

I am trying to find a regular expression in a large memory mapped file
by using regexec() function. I discovered that the program crashes when the file size
is the multiple of the page size.
Is there a regexec() function that has the length of the string
as additional argument?
Or:
How to find a regex in a memory mapped file?
Here is the minimal example that ALWAYS crashes
(if I run less that 3 threads program doesn't crash):
ls -la ttt.txt
-rwx------ 1 bob bob 409600 Jun 14 18:16 ttt.txt
gcc -Wall mal.c -o mal -lpthread -g && ./mal
[1] 11364 segmentation fault (core dumped) ./mal
And the program is:
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <stdio.h>
#include <assert.h>
#include <pthread.h>
#include <regex.h>
void* f(void*arg) {
int size = 409600;
int fd = open("ttt.txt", O_RDONLY);
char* text = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
close(fd);
fd = open("/dev/zero", O_RDONLY);
char* end = mmap(text + size, 4096, PROT_READ, MAP_PRIVATE | MAP_FIXED, fd, 0);
close(fd);
assert(text+size == end);
regex_t myre;
regcomp(&myre, "XXXXX", REG_EXTENDED);
regexec(&myre, text, 0, NULL, 0);
regfree(&myre);
return NULL;
}
int main(int argc, char* argv[]) {
int n = 10;
int i;
pthread_t t[n];
for (i = 0; i < n; ++i) {
pthread_create(&t[n], NULL, f, NULL);
}
for (i = 0; i < n; ++i) {
pthread_join(t[n], NULL);
}
return 0;
}
P.S.
This is the output from gdb:
gdb ./mal
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/bob/prog/c/mal...done.
(gdb) r
Starting program: /home/srdjan/prog/c/mal
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff77ff700 (LWP 11817)]
[New Thread 0x7ffff6ffe700 (LWP 11818)]
[New Thread 0x7ffff6799700 (LWP 11819)]
[New Thread 0x7fffeffff700 (LWP 11820)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff6799700 (LWP 11819)]
__strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:72
72 ../sysdeps/x86_64/multiarch/../strlen.S: No such file or directory.
(gdb) bt
#0 __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:72
#1 0x00007ffff78df254 in __regexec (preg=0x7ffff6798e80, string=0x7fffef79b000 'a' <repeats 200 times>..., nmatch=<optimized out>,
pmatch=0x0, eflags=<optimized out>) at regexec.c:245
#2 0x00000000004008e6 in f (arg=0x0) at mal.c:24
#3 0x00007ffff7bc4e9a in start_thread (arg=0x7ffff6799700) at pthread_create.c:308
#4 0x00007ffff78f24bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5 0x0000000000000000 in ?? ()
(gdb)
Celada correctly identifies the problem - the file data does not necessarily include a null terminator.
You could fix the problem by mapping a page of zeroes immediately after the file:
int fd;
char *text;
fd = open("ttt.txt", O_RDONLY);
text = mmap(NULL, 409600, PROT_READ, MAP_PRIVATE, fd, 0);
close(fd);
fd = open("/dev/zero", O_RDONLY);
mmap(text + 409600, 4096, PROT_READ, MAP_PRIVATE | MAP_FIXED, fd, 0);
close(fd);
(Note that you can close fd immediately after the mmap(), because mmap() adds a reference to the open file description).
You should of course add error-checking to the above. Also, many UNIX systems support a MAP_ANONYMOUS flag which you can use instead of opening /dev/zero (but this is not in POSIX).
The problem is that regexec() is used to match a null-terminated string against the precompiled pattern buffer, but an mmaped file is not necessarily (indeed not usually) null-terminated. Thus, it is looking beyond the end of the file to find a NUL character (0 byte).
You would need a version of regexec() that takes a buffer and a size argument instead of a null-terminated string, but there doesn't appear to be one.

Resources