malloc / calloc crash on side-thread on Linux - c

I'm writing a server-client application in C sharing some informations. Server works in a two-thread mode - with main thread waiting for input and side thread responding to clients' requests. Client works like that too, but it waits for user input (from stdin) and if it receives proper command, it sends a request to a server and waits for response. The wait is done in a side thread processing responses. While this seems all fine and works on Ubuntu-based distro (I use Ultimate Edition 2.7) it crashes on some other distribution. Here's what happens.
Server works flawlessly as it is, but the client suffers from glibc detected crashes (I hope I typed it correctly). When it receives response, it parses its structure which contains:
a) header,
b) some static identifiers,
c) data section containing length and the data itself.
What happens is:
a) client receives packet,
b) client checks for size of it (at least sizeof(header) + sizeof(static_data) + sizeof(length) + data -- and data size is as big as length says)),
c) creates structure - conversion from buffer of chars to the structures above,
d) creates some other structures storing those structures.
Structure is interpreted correctly. I tested it by sending 'fixed' structure to the client through the server's interface and by printing the original, sent data and received informations. So this is not the case. Everything is fine to point c).
To point d) I work on the buffer used to receive incoming packets (max size is specified and the buffer is of that size). To store the structures I got to data. I do it by:
a) allocating new memory of correct size
b) copying data.
I am not to discuss the method. It's all fine as long as it works. But as I said it does not on the other distro. It fails on MALLOC allocating memory in point a). And it fails on everything. My guess was that it could be a problem with thread-safety of malloc and / or printf on the other distro, but the problem is the main thread MOSTLY idles on scanf( .. ) method.
Back to the topic: it fails on anything:
char* buffer = (char*)malloc(fixed_size * sizeof(char))
STRUCT_T* struct = (STRUCT_T*)malloc(sizeof(STRUCT_T)) and so on. No matter what I try to allocate, it always throws the glibc detected error and always points to malloc method (even if it's calloc).
It's making me wonder what's wrong with that ? And this is my thread question. Looks a bit like I am overflowing 'memory space', but I doubt it as it always happen on first response receive. I'd be greatful for any help and can post more details if needed. Side threads are joinable.
Options with which I compile:
CC = gcc
CFLAGS = -Wall -ansi --pedantic -c -O0 -g -std=c99 -pthread
$(CC) $(CFLAGS) server.c -o server.o
gcc server.o $(OBJECTS) -o server -pthread -lm
and includes in client.c file:
sys/fcntl.h
netdb.h
errno.h
stdio.h
unistd.h
stdlib.h
string.h
time.h
pthread.h
math.h
I'm not a newbie with C and Linux, but I mostly work on Windows and C++, so this is rather disturbing. And as I said it works just fine on the distro I use, but it does not on some other while the buffer is parsed correctly.
Thanks in advance.

When malloc crashes, it's usually because you have previously stepped on the data it uses to manage itself (and free). It's difficult or impossible to diagnose at the point of the crash because the problem really happened at some previous time. As has already been suggested, the best way to catch where that previous memory overwrite occurred is to run your program through a program like valgrind, purify, insure++ etc. It will inform you if you overwrite something that you shouldn't. valgrind is free software and it likely to be installed already. It can be as simple as sticking the word valgrind in front of everything else on your client's invocation string.

Related

Prevent my C code from printing (seriously slows down the execution)

I have an issue.
I finally found a way to use an external library to solve my numerical systems. This library automatically prints the matrices. It is fine for dim=5, but for dim=1.000.000, you understand the problem...
Those parasite "printf"s slow down considerably the execution, and I would like to get rid of them. The problem is: I don't know where they are ! I looked in every ".h" and ".c" file in my library: they are nowhere to be found.
I suspect they already are included in the library itself: superlu.so. I can't access them, thus.
How could I possibly prevent my C code from printing anything during the execution ?
Here is my Makefile. I use the libsuperlu-dev library, directly downloaded from Ubuntu. The .so file was already there.
LIB = libsuperlu.so
main: superlu.o read_file.o main.o sample_arrays.o super_csr.o
cc $^ -o $# $(LIB)
clean:
rm *.o
rm main
Just to explain the LD_PRELOAD method that was mentioned, that I use sometimes precisely for that usage (or, on the contrary to add some printf, for example, when I want to pipe the output of a GUI), here is how you can do a rudimentary version of it
myprint.c:
int printf(char *, ...){
return 0;
}
int putchar(int){
return 0;
}
Then
gcc -shared -std=gnu99 -o myprint.so myprint.c
Then
LD_PRELOAD=./myprint.so ./main
Forces the load of your printf and putchar symbols before any other library has the opportunity to load them force. So, no printing occurs. At least none with printf. But you may have to add some other functions to the list, such as fprintf, fputc, fputs, puts, ...
And of course, another problem of overloading the fthing functions (and even possibly the others), is that you might also prevent some wanted behavior. Such as writing files. Or interacting with some devices.
It may be even worse if those printing are done with low level write function. That one, you very likely can't afford to overload (unless you overload it with a function that calls the real write, loaded manually by dlopen) filtering only the ones that you want to avoid, based on target file descriptor (1) or on content of written data.
Note: if you want to verify if the libsuperlu.so is responsible of those printing, you can check with nm libsuperlu.so if it is referring to some well known printing functions, such as printf

Silent symbol collision?

Posted on the Raspberry Stackexchange before, closed as off-topic, so here I am.
The following program compiles cleanly with gcc main.c -lpigpio -lpthread -Wall -Wextra. I'm using gcc 8.3.0 (crosscompiler or normal compiler doesn't change anything).
// main.c
#include <stdio.h>
#include <pigpio.h>
void listen() {
printf("hi from listen\n");
}
int main() {
gpioInitialise();
printf("hi from main\n");
gpioTerminate();
return 0;
}
No warnings, yet the program happily prints
hi from listen
hi from main
The cause seems to be that gpioInitialise() tries to call listen(2) from <sys/sockets.h> here, but instead calls the function from main.c. Of course, this doesn't work and results in an error message from the pigpio library:
xxxx-xx-xx xx:xx:xx pthSocketThread: setsockopt() fail, closing socket -1
xxxx-xx-xx xx:xx:xx pthSocketThread: accept failed (Bad file descriptor)
Question: How come that gcc links a function with a completly different signature without throwing any errors? I know about static, but I want to make sense of this to learn from it. Surely it can't be as simple as listen(2) not being declared as static?
Linker allows you to interpose symbols like this for debugging, profiling, etc. purposes. Unfortunately at link stage the typing information is generally not available so linker can not perform any sanity checking to verify that expected and actual signatures match. One might argue that with -g this would be possible but noone cared enough to implement this (this might be a good home project though).
Static would help in your case because it would prevent listen from being exported and libc's listen from being overridden.
As a side note, on ELF systems (Linux) situation is ever worse because they allow runtime symbol interposition (i.e. libraries compete for who wins over symbol names at program startup). This is typically resolved with static or -fvisiblity=hidden (unfortunately many libraries do not use them).

Inconsistent recv() behavior

Running a little internal CTF to teach people some computer security basics and I've run into a strange behavior. The following is the handle function of a forking TCP server. It is just a cute little buffer overflow demonstration (taken from CSAW CTF).
When testing, I only ever bothered sending it 4097 bytes worth of data, because that will successfully overflow into the backdoor variable. However, many of the participants decided to try to send exactly 4099 bytes and this doesn't actually work. I'm not entirely sure why.
In GDB, recving 4099 bytes works just fine, but otherwise it does not. I've spent a good amount of time debugging this now, as I'd like a good explanation for everybody as to why the service behaved as it did. Is it some sort of quirk with the recv() call or am I doing something fundamentally wrong here?
void handle(int fd)
{
int backdoor = 0;
char attack[4096];
send(fd, greeting, strlen(greeting), 0);
sleep(3);
recv(fd, attack, 0x1003, 0);
if (backdoor)
{
dup2(fd, 0); dup2(fd, 1); dup2(fd, 2);
char* argv[] = {"/bin/cat", "flag", NULL};
execve(argv[0], argv, NULL);
exit(0);
}
send(fd, nope, strlen(nope), 0);
}
Edit
The executable was compiled with:
clang -o backdoor backdoor.c -O0 -fno-stack-protector
I did not use different optimization settings for debugging / the live executable. I can run the following command:
python -c "print 'A'*4099" | nc <ip> <port>
and this will not work. I then attach to the running process via GDB (setting a breakpoint directly after the recv call) and run the above command again and it does work. I have repeated this multiple times with some variations, yet the same results.
Could it be something to do with the way that the OS handles queueing excess bytes sent to the socket? When I am sending 4099 bytes with the above command, I am actually sending 5000 (Python's print appends a newline implicitly). This means that recv's newline gets truncated and is left for the next call to recv to clean up. Still can't figure out how GDB could influence this at all, but just a theory.
... am I doing something fundamentally wrong here?
Yes, you are expecting undefined behaviour to be predictable. It isn't.
If you compile that function with gcc, using -O3, then you'll get a warning about exceeding the size of the receive buffer (but of course you already knew that); but you'll also get a binary which does not actually bother to check backdoor. If you use clang, you don't get the warning, but you get a binary which doesn't even allocate space for backdoor.
The reason is clear: modifying backdoor through a "backdoor" is undefined behaviour, and the compiler is under no obligation to do anything you might consider logical or predictable in the face of undefined behaviour. In particular, it's allowed to assume that the undefined behaviour never happens. Since no valid program could mutate backdoor, the compiler is allowed to assume that backdoor never gets mutated, and hence it can ditch the code inside the if block as unreachable.
You don't mention how you're compiling this program, but if you're compiling without optimization to use gdb and with optimisation when you don't plan to use gdb, then you should not be surprised that undefined behaviour is handled differently. On the other hand, even if you are compiling the program with the same compiler and options in both cases, you still shouldn't be surprised, since undefined behaviour is, as it says, undefined.
Declaring backdoor as volatile might prevent the optimization. Although that's hardly the point, is it?
Note: I'm using gcc version 4.8.1 and clang version 3.4. Different versions (and even different builds) might have different results.

Mudflap and pointer arrays

I've just implemented a pretty complicated piece of software, but my school's testing system won't take it.
The system uses the so-called mudflap library which should be able to prevent illegal memory accesses better. As a consequence, my program generates segfaults when run on the school's testing system (I submit the source code and the testing system compiles it for itself, using the mudflap library).
I tried to isolate the problematic code in my program, and it seems that it all boils down to something as simple as pointer arrays. Mudflap doesn't seem to like them.
Below is a piece of some very simple code with that works with a pointer array:
#include <stdlib.h>
int main()
{
char** rows;
rows=(char**)malloc(sizeof(char*)*3);
rows[0]=(char*)malloc(sizeof(char)*4);
rows[1]=(char*)malloc(sizeof(char)*4);
rows[2]=(char*)malloc(sizeof(char)*4);
strcpy(rows[0], "abc");
strcpy(rows[1], "abc");
strcpy(rows[2], "abc");
free(rows[0]); free(rows[1]); free(rows[2]);
free(rows);
return 0;
This will generate a segfault with mudflap. In my opinion, this is a perfectly legal code.
Could you please explain to me what is wrong with it, and why it generates a segfault with mudflap?
Note: The program should be compiled under an amd64 linux system with g++ using the following commands:
export MUDFLAP_OPTIONS='-viol-segv -print-leaks';
g++ -Wall -pedantic -fmudflap -fmudflapir -lmudflap -g file.cpp
You have at least one problem here:
char** rows;
rows=(char**)malloc(3);
This allocates 3 bytes. On most platforms the allocator probably has a minimum of at least 4 bytes which lets you get away with overwriting the buffer a bit. I'm guessing your mudflap library is more strict in its checking and catches the overwrite.
However, if you want an array of 3 char * pointers, you probably need at least 12 bytes.
Try changing these lines to:
char** rows;
rows=(char**)malloc(3 * sizeof(char *));
EDIT: Based on your modified code, I agree it looks correct now. The only thing I can suggest is that perhaps malloc() is failing and causing a NULL pointer access. If thats not the case it sounds like a bug or misconfiguration of mudflap.

dlmalloc crash on Win7

For some time now I've been happily using dlmalloc for a cross-platform project (Windows, Mac OS X, Ubuntu). Recently, however, it seems that using dlmalloc leads to a crash-on-exit on Windows 7.
To make sure that it wasn't something goofy in my project, I created a super-minimal test program-- it doesn't do anything but return from main. One version ("malloctest") links to dlmalloc and the other ("regulartest") doesn't. On WinXP, both run fine. On Windows 7, malloctest crashes. You can see screencasts of the tests here.
My question is: why is this happening? Is it a bug in dlmalloc? Or has the loader in Windows 7 changed? Is there a workaround?
fyi, here is the test code (test.cpp):
#include <stdio.h>
int main() {
return 0;
}
and here is the nmake makefile:
all: regulartest.exe malloctest.exe
malloctest.exe: malloc.obj test.obj
link /out:$# $**
regulartest.exe: test.obj
link /out:$# $**
clean:
del *.exe *.obj
For brevity, I won't include the dlmalloc source in this post, but you can get it (v2.8.4) here.
Edit: See these other relavent SO posts:
Is there a way to redefine malloc at link time on Windows?
Globally override malloc in visual c++
Looks like a bug in the C runtime. Using Visual Studio 2008 on Windows 7, I reproduced the same problem. After some quick debugging by putting breakpoints in dlmalloc and dlfree, I saw that dlfree was getting called with an address that it never returned earlier from dlmalloc, and then it was hitting an access violation shortly thereafter.
Thankfully, the C runtime's source code is distributed along with VS, so I could see that this call to free was coming from the __endstdio function in _file.c. The corresponding allocation was in __initstdio, and it was calling _calloc_crt to allocate its memory. _calloc_crt calls _calloc_impl, which calls HeapAlloc to get memory. _malloc_crt (used elsewhere in the C runtime, such as to allocate memory for the environment and for argv), on the other hand, calls straight to malloc, and _free_crt calls straight to free.
So, for the memory that gets allocated with _malloc_crt and freed with _free_crt, everything is fine and dandy. But for the memory that gets allocated with _calloc_crt and freed with _free_crt, bad things happen.
I don't know if replacing malloc like this is supported -- if it is, then this is a bug with the CRT. If not, I'd suggest looking into a different C runtime (e.g. MinGW or Cygwin GCC).
Using dlmalloc in cross-platform code is an oxymoron. Replacing any standard C functions (especially malloc and family) results in undefined behavior. The closest thing to a portable way to replace malloc is using search-and-replace (not #define; that's also UB) on the source files to call (for example) my_malloc instead of malloc. Note that internal C library functions will still use their standard malloc, so if the two conflict, things will still blow up. Basically, trying to replace malloc is just really misguided. If your system really has a broken malloc implementation (too slow, too much fragmentation, etc.) then you need to do your replacement in an implementation-specific way, and disable the replacement on all systems except ones where you've carefully checked that your implementation-specific replacement works correctly.

Resources