Heap corruption in HP-UX? - c

I'm trying to understand what's going wrong with a program run in HP-UX 11.11 that results in a SIGSEGV (11, segmentation fault):
(gdb) bt
#0 0x737390e8 in _sigfillset+0x618 () from /usr/lib/libc.2
#1 0x73736a8c in _sscanf+0x55c () from /usr/lib/libc.2
#2 0x7373c23c in malloc+0x18c () from /usr/lib/libc.2
#3 0x7379e3f8 in _findbuf+0x138 () from /usr/lib/libc.2
#4 0x7379c9f4 in _filbuf+0x34 () from /usr/lib/libc.2
#5 0x7379c604 in __fgets_unlocked+0x84 () from /usr/lib/libc.2
#6 0x7379c7fc in fgets+0xbc () from /usr/lib/libc.2
#7 0x7378ecec in __nsw_getoneconfig+0xf4 () from /usr/lib/libc.2
#8 0x7378f8b8 in __nsw_getconfig+0x150 () from /usr/lib/libc.2
#9 0x737903a8 in __thread_cond_init_default+0x100 () from /usr/lib/libc.2
#10 0x737909a0 in nss_search+0x80 () from /usr/lib/libc.2
#11 0x736e7320 in __gethostbyname_r+0x140 () from /usr/lib/libc.2
#12 0x736e74bc in gethostbyname+0x94 () from /usr/lib/libc.2
#13 0x11780 in dnetResolveName (name=0x400080d8 "smtp.org.com", hent=0x737f3334) at src/dnet.c:64
..
The problem seems to be occurring somewhere inside libc! A system call trace ends with:
Connecting to server smtp.org.com on port 25
write(1, "C o n n e c t i n g t o s e ".., 51) .......................... = 51
open("/etc/nsswitch.conf", O_RDONLY, 0666) ............................... [entry]
open("/etc/nsswitch.conf", O_RDONLY, 0666) ................................... = 5
Received signal 11, SIGSEGV, in user mode, [SIG_DFL], partial siginfo
Siginfo: si_code: I_NONEXIST, faulting address: 0x400118fc, si_errno: 0
PC: 0xc01980eb, instruction: 0x0d3f1280
exit(11) [implicit] ............................ WIFSIGNALED(SIGSEGV)|WCOREDUMP
Last instruction by the program:
struct hostent *him;
him = gethostbyname(name); // name == "smtp.org.com" as shown by gdb
Is this a problem with the system, or am I missing something?
Any guidance for digging deeper would be appreciated.
Thx.

Long story short: vsnprintf corrupted my heap under HP-UX 11.11.
vsnprintf was introduced in C99 (ISO/IEC 9899:1999) and "is equivalent to snprintf, with the variable argument list" (§7.19.6.12.2), snprintf (§7.19.6.5.2): "If n is zero, nothing is written".
Well, HP UX 11.11 doesn't comply with this specification. When 2nd arg == 0, arguments are written at the end of the 1st arg.. which, of course, corrupts the heap (I allocate no space when maxsize==0, given that nothing should be written).
HP manual pages are unclear ("It is the user's responsibility to ensure that enough storage is available."), nothing is said regarding the case of maxsize==0. Nice trap.. at the very least, the WARNINGS section of the man page should warn std-compliant users..
It's an egg/chicken pb: vnsprintf is variadic, so for the "user's responsibility" to ensure that enough storage is available" the "user's responsibility" must first know how much space is needed. And the best way to do that is to call vnsprintf with 2nd arg == 0: it should then return the amount of space required and sprintfs nothing.. well, except HP's !
One solution to use vnsprintf under this std violation to determine needed space: malloc 1 byte more to your buffer (1st arg) and call vnsprintf(buf+buf.length,1,..). This only puts a \0 in the new byte you allocated. Silly, but effective. If you're under wchar conditions, malloc(sizeof..).
Anyway, workaround is trivial : never call v/snprintf under HP-UX with maxsize==0!
I now have a happy stable program!
Thanks to all contributers.
Heap corruption through vsnprintf under HP-UX B11.11
This program prints "##" under Linux/Cygwin/..
It prints "#fooo#" under HP-UX B11.11:
#include <stdarg.h>
#include <stdio.h>
const int S=2;
void f (const char *fmt, ...) {
va_list ap;
int actualLen=0;
char buf[S];
bzero(buf, S);
va_start(ap, fmt);
actualLen = vsnprintf(buf, 0, fmt, ap);
va_end(ap);
printf("#%s#\n", buf);
}
int main () {
f("%s", "fooo");
return 0;
}

Whenever this situation happens to me (unexpected segfault in a system lib), it is usually because I did something foolish somewhere else, i.e. buffer overrun, double delete on a pointer, etc.
In those instances where my mistake is not obvious, I use valgrind. Something like the following is usually sufficient:
valgrind -v --leak-check=yes --show-reachable=yes ./myprog
I assume valgrind may be used in HP-UX...

Your stack trace is in malloc which almost certainly means that somewhere you corrupted one of malloc's data structures. As a previous answer said, you likely have a buffer overrun or underrun and corrupted one of the items allocated off the heap.
Another explanation is that you tried to do a free on something that didn't come from the heap, but that's less likely--that would probably have crashed right in free.

Reading the (OS X) manpage says that gethostbyname() returns a pointer, but as far as I can tell may not be allocating memory for that pointer. Do you need to malloc() first? Try this:
struct hostent *him = malloc(sizeof(struct hostent));
him = gethostbyname(name);
...
free(him);
Does that work any better?
EDIT: I tested this and it's probably wrong. Granted I used the bare string "stmp.org.com" instead of a variable, but both versions (with and without malloc()ing) worked on OS X. Maybe HP-UX is different.

Related

How to find which function from program reads from stdin?

I am currently working on fuzzing a program, and the code base is huge. To improve the performance, I am using persistent mode by creating a loop around the necessary function or code that reads from stdin. Right now using gdb, I am able to enumerate all the functions being used by the program like this:
set logging on
set confirm off
rbreak ^[^#]*$
run the binary
continue
This gives me all the functions that the program uses, but I think an easier way than reading hundreds of lines is by finding the function that reads from stdin. How would I be able to find the function that reads from stdin?
Since you're running Linux, virtually every function that reads from a stream (such as stdin) will ultimately do a read system call. (Less often, they will call readv.)
The C prototype for the read function is
ssize_t read(int fd, void *buf, size_t count);
and like most Linux system calls, this is pretty much the prototype for the actual system call (all the integer and pointer types are put into registers.)
On x86_64, the first argument to a system call will be in register rdi. (See Calling conventions.) A value of 0 means stdin.
So first we will tell GDB to stop the process upon entering the read system call, adding a condition to stop only when its first argument is 0:
(gdb) catch syscall read
Catchpoint 1 (syscall 'read' [0])
(gdb) condition 1 $rdi == 0
(gdb) run
Starting program: cat
Catchpoint 1 (call to syscall read), 0x00007fffff13b910 in __read_nocancel () at ../sysdeps/unix/syscall-template.S:84
84 ../sysdeps/unix/syscall-template.S: No such file or directory.
Now do a backtrace to see all the functions in the call stack:
(gdb) bt
#0 0x00007fffff13b910 in __read_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1 0x00007fffff0d2a84 in __GI__IO_file_xsgetn (fp=0x7fffff3f98c0 <_IO_2_1_stdin_>, data=<optimized out>, n=4096)
at fileops.c:1442
#2 0x00007fffff0c7ad9 in __GI__IO_fread (buf=<optimized out>, size=1, count=4096, fp=0x7fffff3f98c0 <_IO_2_1_stdin_>)
at iofread.c:38
#3 0x00000000080007c2 in copy () at cat.c:6
#4 0x00000000080007de in main () at cat.c:12
(gdb) fr 3
#3 0x00000000080007c2 in copy () at cat.c:6
6 while ((n=fread(buf, 1, sizeof buf, stdin)) > 0)
(gdb) fr 4
#4 0x00000000080007de in main () at cat.c:12
12 copy();
How would I be able to find the function that reads from stdin?
In general, your question is equivalent to the halting problem. Consider this function:
ssize_t foo(int fd, void *buf, size_t count) { return read(fd, buf, count); }
Does this function read from stdin? It may or may not (depending on inputs), and therein lies the problem.
P.S. Your method of enumerating all functions that are called is exceedingly inefficient. You should look into building your program with -finstrument-functions instead. Example.

malloc() and memset() behavior

I wrote some code to see how malloc() and memset() behave, and I found a case where I don't know what's going on.
I used malloc() to allocate 15 bytes of memory for a character array, and I
wanted to see what would happen if I used memset() incorrectly to set 100 bytes of memory in the pointer I created. I expected to see that memset() had set 15 bytes (and possibly trash some other memory). What I'm seeing when I run the program is that it's setting 26 bytes of memory to the character that I coded.
Any idea why there are 26 bytes allocated for the pointer I created? I'm compiling with gcc and glibc. Here's the code:
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
#define ARRLEN 14
int main(void) {
/* + 1 for the null terminator */
char *charptr = malloc((sizeof(*charptr) * ARRLEN) + 1);
if (!charptr)
exit(EXIT_FAILURE);
memset(charptr, '\0', (sizeof(*charptr) * ARRLEN) + 1);
/* here's the intentionally incorrect call to memset() */
memset(charptr, 'a', 100);
printf("sizeof(char) ------ %ld\n", sizeof(char));
printf("sizeof(charptr) --- %ld\n", sizeof(charptr));
printf("sizeof(*charptr) --- %ld\n", sizeof(*charptr));
printf("sizeof(&charptr) --- %ld\n", sizeof(&charptr));
printf("strlen(charptr) --- %ld\n", strlen(charptr));
printf("charptr string ---- >>%s<<\n", charptr);
free(charptr);
return 0;
}
This is the output I get:
sizeof(char) ------ 1
sizeof(charptr) --- 8
sizeof(*charptr) --- 1
sizeof(&charptr) --- 8
strlen(charptr) --- 26
charptr string ---- >>aaaaaaaaaaaaaaaaaaaaaaaa<<
First of all, this is undefined behavior, so anything can happen; as said in a comment, on my machine I get your exact same behavior with optimizations disabled, but turning on optimizations I get a warning about a potential buffer overflow at compile time (impressive job gcc!) and a big crash at runtime. Even better, if I print it with a puts before the printf calls I get it printed with a different number of a.
Still, I have the dubious luck to have the exact same behavior as you, so let's investigate. I compiled your program with no optimization and debug information
[matteo#teokubuntu ~/scratch]$ gcc -g memset_test.c
then I fired up the debugger and added a breakpoint on the first printf, just after the memset.
Reading symbols from a.out...done.
(gdb) break 20
Breakpoint 1 at 0x87e: file memset_test.c, line 20.
(gdb) r
Starting program: /home/matteo/scratch/a.out
Breakpoint 1, main () at memset_test.c:20
20 printf("sizeof(char) ------ %ld\n", sizeof(char));
now we can set a hardware write breakpoint on the 26th memory location pointed by charptr
(gdb) p charptr
$1 = 0x555555756260 'a' <repeats 100 times>
(gdb) watch charptr[26]
Hardware watchpoint 2: charptr[26]
... and so...
(gdb) c
Continuing.
Hardware watchpoint 2: charptr[26]
Old value = 97 'a'
New value = 0 '\000'
_int_malloc (av=av#entry=0x7ffff7dcfc40 <main_arena>, bytes=bytes#entry=1024) at malloc.c:4100
4100 malloc.c: File o directory non esistente.
(gdb) bt
#0 _int_malloc (av=av#entry=0x7ffff7dcfc40 <main_arena>, bytes=bytes#entry=1024) at malloc.c:4100
#1 0x00007ffff7a7b0fc in __GI___libc_malloc (bytes=1024) at malloc.c:3057
#2 0x00007ffff7a6218c in __GI__IO_file_doallocate (fp=0x7ffff7dd0760 <_IO_2_1_stdout_>) at filedoalloc.c:101
#3 0x00007ffff7a72379 in __GI__IO_doallocbuf (fp=fp#entry=0x7ffff7dd0760 <_IO_2_1_stdout_>) at genops.c:365
#4 0x00007ffff7a71498 in _IO_new_file_overflow (f=0x7ffff7dd0760 <_IO_2_1_stdout_>, ch=-1) at fileops.c:759
#5 0x00007ffff7a6f9ed in _IO_new_file_xsputn (f=0x7ffff7dd0760 <_IO_2_1_stdout_>, data=<optimized out>, n=23)
at fileops.c:1266
#6 0x00007ffff7a3f534 in _IO_vfprintf_internal (s=0x7ffff7dd0760 <_IO_2_1_stdout_>,
format=0x5555555549c8 "sizeof(char) ------ %ld\n", ap=ap#entry=0x7fffffffe330) at vfprintf.c:1328
#7 0x00007ffff7a48f26 in __printf (format=<optimized out>) at printf.c:33
#8 0x0000555555554894 in main () at memset_test.c:20
(gdb)
So, it's just malloc code invoked (more or less indirectly) by printf doing its stuff on the memory block immediately adjacent to the one it gave you (possibly marking it as used).
Long story short: you took memory that wasn't yours and it's now getting modified by its rightful owner at the first occasion when he needed it; nothing particularly strange or interesting.
I used malloc() to allocate 15 bytes of memory for a character array,
and I wanted to see what would happen if I used memset() incorrectly
to set 100 bytes of memory in the pointer I created.
What happens is undefined behavior as far as the language standard is concerned, and whatever actually does happen in a particular case is not predictable from the source code and may not be consistent across different C implementations or even different runs of the same program.
I expected to see
that memset() had set 15 bytes (and possibly trash some other memory).
That would be one plausible result, but that you had any particular expectation at all is dangerous. Do not assume that you can predict what manifestation UB will take, not even based on past experience. And since you should not do that, and cannot learn anything useful from that way, it is not worthwhile to experiment with UB.
What I'm seeing when I run the program is that it's setting 26 bytes
of memory to the character that I coded.
Any idea why there are 26 bytes allocated for the pointer I created?
Who says your experiment demonstrates that to be the case? Not only the memset() but also the last printf() exhibits UB. The output tells you nothing other than what the output was. That time.
Now in general, malloc may reserve a larger block than you request. Many implementations internally manage memory in chunks larger than one byte, such as 16 or 32 bytes. But that has nothing to do one way or another with the definedness of your program's behavior, and it does not speak to your output.

signal 11 SIGSEGV in malloc?

I usually love good explained questions and answers. But in this case I really can't give any more clues.
The question is: why malloc() is giving me SIGSEGV? The debug bellow show the program has no time to test the returned pointer to NULL and exit. The program quits INSIDE MALLOC!
I'm assuming my malloc in glibc is just fine. I have a debian/linux wheezy system, updated, in an old pentium (i386/i486 arch).
To be able to track, I generated a core dump. Lets follow it:
iguana$gdb xadreco core-20131207-150611.dump
Core was generated by `./xadreco'.
Program terminated with signal 11, Segmentation fault.
#0 0xb767fef5 in ?? () from /lib/i386-linux-gnu/libc.so.6
(gdb) bt
#0 0xb767fef5 in ?? () from /lib/i386-linux-gnu/libc.so.6
#1 0xb76824bc in malloc () from /lib/i386-linux-gnu/libc.so.6
#2 0x080529c3 in enche_pmovi (cabeca=0xbfd40de0, pmovi=0x...) at xadreco.c:4519
#3 0x0804b93a in geramov (tabu=..., nmovi=0xbfd411f8) at xadreco.c:1473
#4 0x0804e7b7 in minimax (atual=..., deep=1, alfa=-105000, bet...) at xadreco.c:2778
#5 0x0804e9fa in minimax (atual=..., deep=0, alfa=-105000, bet...) at xadreco.c:2827
#6 0x0804de62 in compjoga (tabu=0xbfd41924) at xadreco.c:2508
#7 0x080490b5 in main (argc=1, argv=0xbfd41b24) at xadreco.c:604
(gdb) frame 2
#2 0x080529c3 in enche_pmovi (cabeca=0xbfd40de0, pmovi=0x ...) at xadreco.c:4519
4519 movimento *paux = (movimento *) malloc (sizeof (movimento));
(gdb) l
4516
4517 void enche_pmovi (movimento **cabeca, movimento **pmovi, int c0, int c1, int c2, int c3, int p, int r, int e, int f, int *nmovi)
4518 {
4519 movimento *paux = (movimento *) malloc (sizeof (movimento));
4520 if (paux == NULL)
4521 exit(1);
Of course I need to look at frame 2, the last on stack related to my code. But the line 4519 gives SIGSEGV! It does not have time to test, on line 4520, if paux==NULL or not.
Here it is "movimento" (abbreviated):
typedef struct smovimento
{
int lance[4]; //move in integer notation
int roque; // etc. ...
struct smovimento *prox;// pointer to next
} movimento;
This program can load a LOT of memory. And I know the memory is in its limits. But I thought malloc would handle better when memory is not available.
Doing a $free -h during execution, I can see memory down to as low as 1MB! Thats ok. The old computer only has 96MB. And 50MB is used by the OS.
I don't know to where start looking. Maybe check available memory BEFORE a malloc call? But that sounds a wast of computer power, as malloc would supposedly do that. sizeof (movimento) is about 48 bytes. If I test before, at least I'll have some confirmation of the bug.
Any ideas, please share. Thanks.
Any crash inside malloc (or free) is an almost sure sign of heap corruption, which can come in many forms:
overflowing or underflowing a heap buffer
freeing something twice
freeing a non-heap pointer
writing to freed block
etc.
These bugs are very hard to catch without tool support, because the crash often comes many thousands of instructions, and possibly many calls to malloc or free later, in code that is often in a completely different part of the program and very far from where the bug is.
The good news is that tools like Valgrind or AddressSanitizer usually point you straight at the problem.

find the source of crash in C program

I have a code base where the process running on linux is crashing in free and sometimes in malloc:
#0 0xffffe430 in __kernel_vsyscall ()
#1 0xf7424e30 in raise () from /lib/libc.so.6
#2 0xf7426765 in abort () from /lib/libc.so.6
#3 0xf7469d33 in malloc_printerr () from /lib/libc.so.6
#4 0xf746e7bc in free () from /lib/libc.so.6
#5 0xf6047e25 in myFree (mem_ptr=0x82165d8) at ../my_code/mylib.c:78
#6 0xf6014a10 in FreeBuffer (buffer=0x82042f8)
In the code I am not seeing anything fishy at the place where memory freeing is happening.
Function myFree() has nothing but a free() function call.
void FreeBuffer(struct MY_BUFFER *buffer)
{
if (buffer)
{
myFree(buffer);
}
}
void myFree(void *mem_ptr)
{
free(mem_ptr);
}
I tried using MALLOC_CHECK_ but it was not of any help.
I suspect some where the heap is getting corrupted and want to find that.
Any hint to proceed to debug the process in such cases?
There is hard to be 100% sure but most probably you are trying to free memory twice. It's great that you are checking if pointer is not NULL before trying to free it
if (buffer)
{
myFree(buffer);
}
but you are probably NOT setting pointer NULL after doing exact free job - that's my guess.
Check if you have something like this
struct MY_BUFFER *buffer;
//do something with buffer
FreeBuffer(buffer)
buffer = NULL;
You can do this as well inside FreeBuffer() function but to do so you will have to pass address of pointer AKA pointer to pointer so the definition will be like this FreeBuffer(struct MY_BUFFER **buffer)

Segmenation fault in malloc()?

I am getting a segmentation fault inside of the malloc() routine. Here is the stacktrace from gdb:
#0 0x00007ffff787e882 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff787fec6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff7882a45 in malloc () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x0000000000403ab0 in xmalloc (size=1024) at global.c:14
#4 0x00000000004020fb in processConnectionQueue (arguments=0x60a4e0)
at connection.c:117
#5 0x00007ffff7bc4e9a in start_thread ()
from /lib/x86_64-linux-gnu/libpthread.so.0
#6 0x00007ffff78f24bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#7 0x0000000000000000 in ?? ()
What's going on? What could cause malloc() to segfault?
EDIT: Here is the code from xmalloc(). It's pretty standard, and as you can see from the stacktrace it's calling malloc with a size of 1024.
void* xmalloc(size_t size)
{
void* result = malloc(size);
if(!result)
{
if(!size)
{
result = malloc(1);
}
if(!result)
{
fprintf(stderr, "Error allocating memory of size %zu\n", size);
exit(-1);
}
}
return result;
}
And line 117 in connection.c:
item->readBuffer = xmalloc(kInitialPacketBufferSize);
You are most likely seeing the effect of an error elsewhere in your code, that access memory outside an allocation. If you are lucky enough, your code can touch some of the internal values malloc uses to track allocations.
If you have the possibility, try linking your code with an allocation checker like libefence or similar, and use this to locate the real problem.
Can you check the line
#3 0x0000000000403ab0 in xmalloc (size=1024) at global.c:14
the global.c for the size=1024.
and also you can use some tools as Wegge mentioned to detect some issue regrading this segfault.
There are at least three basic causes of this.
You overran the item returned by malloc() and wrote something before its beginning or after its end.
You freed something twice, which in a way is a special case of
You freed something that wasn't a malloc() result.
Any of these actions will corrupt the malloc)) heap, which causes it to malfunction.

Resources