Segfault after launching a new thread

Segfault after launching a new thread - c

I am writing a stock market system that uses several threads to process the incoming orders.
The project was going fine until i added one more thread. When i launch the said thread my program segfaults. The segfault is generated in the above thread by an invalid memory read.
This segfault is generated only when the program is compiled with optimization -O2 and above.
After compiling the programming with debug info using -g3 and running valgrind using
valgrind ./marketSim
and get the following output about the segfault
==2524== Thread 5:
==2524== Invalid read of size 4
==2524== at 0x402914: limitWorker (limit.c:4)
==2524== by 0x4E33D5F: start_thread (in /lib/libpthread-2.14.so)
==2524== Address 0x1c is not stack'd, malloc'd or (recently) free'd
==2524==
==2524==
==2524== Process terminating with default action of signal 11 (SIGSEGV)
==2524== Access not within mapped region at address 0x1C
==2524== at 0x402914: limitWorker (limit.c:4)
==2524== by 0x4E33D5F: start_thread (in /lib/libpthread-2.14.so)
The thread is launched like this
pthread_t limit_thread;
pthread_create(&limit_thread, NULL, limitWorker, q);
q is variable which is also passed to other threads i initialize
the limitWorker code is as follows
void *limitWorker(void *arg){
while(1){
if ((!lsl->empty) && (!lbl->empty)) {
if ((currentPriceX10 > lGetHead(lsl)->price1) && (currentPriceX10 < lGetHead(lbl)->price1)) {
llPairDelete(lsl,lbl);
}
}
}
return NULL;
}
Line 4: The line which according to valgrind produces the segfault is void *limitWorker(void *arg){
Also some more info this is compiled using gcc 4.6.1, when using gcc 4.1.2 the program doesn't segfault, even when it is optimized although it's performance is much worse.
When the program is complied using clang it also doesn't segfault when optimized.
Question
Am i making a mistake?? Is it a gcc bug?? What course of action should i follow??
If you want to take a look at the code the github page is https://github.com/spapageo/Stock-Market-Real-Time-System/
The code in question is in file marketSim.c and limit.c
EDIT: Valgrind specifies that the invalid read happens at line 4. Line 4 is the "head" of the function. I don't know compiler internals, so my naive thought is that the argument is wrong. BUT while using gdb after the segfault the argument , because the program is optimized, is optimized out according to gdb. So i don't think that that is the culprit.

If you are compiling for a 64 bit system, then 0x1c is the offset of the price1 field within the order struct. This implies that either (or both) of lsl->HEAD and lbl->HEAD are NULL pointers when the fault occurs.
Note that because your limitWorker() function includes no thread synchronisation outside of the llPairDelete() function, it is incorrect and the compiler may not be reloading those values on every execution of the loop. You should be using a using a mutex to protect the linked lists even in the read-only paths.
Additionally, your lsl and lbl variables are multiply defined. You should declare them as extern in limit.h, and define them without the extern in limit.c.

Related

C malloc "can't allocate region" error, but can't repro with GDB?

How can I debug a C application that does not crash when attached with gdb and run inside of gdb?
It crashes consistently when run standalone - even the same debug build!
A few of us are getting this error with a C program written for BSD/Linux, and we are compiling on macOS with OpenSSL.
app(37457,0x7000017c7000) malloc: *** mach_vm_map(size=13835058055282167808) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
ERROR: malloc(buf->length + 1) failed!
I know, not helpful.
Recompiling the application with -g -rdynamic gives the same error. Ok, so now we know it isn't because of a release build as it continues to fail.
It works when running within a gdb debugging session though!!
$ sudo gdb app
(gdb) b malloc_error_break
Function "malloc_error_break" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (malloc_error_break) pending.
(gdb) run -threads 8
Starting program: ~/code/app/app -threads 8
[New Thread 0x1903 of process 45436]
warning: unhandled dyld version (15)
And it runs for hours. CTRL-C, and run ./app -threads 8 and it crashes after a second or two (a few million iterations).
Obviously there's an issue within one of the threads. But those workers for the threads are pretty big (a few hundred lines of code). Nothing stands out.
Note that the threads iterate over loops of about 20 million per second.
macOS 10.12.3
Homebrew w/GNU gcc and openssl (linking to crypto)
Ps, not familiar with C too much - especially any type of debugging. Be kind and expressive/verbose in answers. :)

One debugging technique that is sometimes overlooked is to include debug prints in the code, of course it has it's disadvantages, but also it has advantages. A thing you must keep in mind though in the face of abnormal termination is to make sure the printouts actually get printed. Often it's enough to print to stderr (but if that doesn't make the trick one may need to fflush the stream explicitly).
Another trick is to stop the program before the error occurs. This requires you to know when the program is about to crash, preferably as close as possible. You do this by using raise:
raise(SIGSTOP);
This does not terminate the program, it just suspends execution. Now you can attach with gdb using the command gdb <program-name> <pid> (use ps to find the pid of the process). Now in gdb you have to tell it to ignore SIGSTOP:
> handle SIGSTOP ignore
Then you can set break-points. You can also step out of the raise function using the finish command (may have to be issued multiple times to return to your code).
This technique makes the program have normal behaviour up to the time you decide to stop it, hopefully the final part when running under gdb would not alter the behavior enuogh.
A third option is to use valgrind. Normally when you see these kind of errors there's errors involved that valgrind will pick up. These are accesses out of range and uninitialized variables.

Many memory managers initialise memory to a known bad value to expose problems like this (e.g. Microsoft's CRT will use a range of values (0xCD means uninitialised, 0xDD means already free etc).
After each use of malloc, try memset'ing the memory to 0xCD (or some other constant value). This will allow you to identify uninitialised memory more easily with the debugger. don't use 0x00 as this is a 'normal' value and will be harder to spot if it's wrong (it will also probably 'fix' your problem).
Something like:
void *memory = malloc(sizeof(my_object));
memset(memory, 0xCD, sizeof(my_object));
If you know the size of the blocks, you could do something similar before free (this is sometimes harder unless you know the size of your objects, or track it in some way):
memset(memory, 0xDD, sizeof(my_object));
free(memory);

SIGSEGV Segmentation fault, different message

I am trying to run the program to test buffer overflow, but when program crashes it shows me SIGSEGV error as follows:
Program received signal SIGSEGV, Segmentation fault.
0x00000000004006c0 in main (argc=2, argv=0x7fffffffde78)
But the tutorial which I am following is getting the below message:
Program received signal SIGSEGV, Segmentation fault. 0x41414141 in ??
()
Due to this I am not able to get the exact memory location of buffer overflow.
I have already used -fno-stack-protector while compiling my program. because before this I was getting SIGABRT error.
Does anyone have any clue so that i can get in sync with the tutorial.

I was able to figure out the difference in both.
Actually I was trying the same code on Ubuntu 64-bit on virtual box.
But then I tried installing Ubuntu 32-bit on virtual box, so now I am also getting the same message as what was coming in the tutorial.
Also another difference which I noticed in 64 bit and 32-bit OS is that when using 32 bit we can examine the stack using $esp but in 64-bit machine we have to use $rsp

SIGSEGV is the signal raised when your program attempts to access a memory location where it is not supposed to do. Two typical scenarios are:
Deference a non-initialized pointer.
Access an array out-of-bound.
Note, however, even in these two cases, there is no guarantee that SIGSEGV always happen. So don't expect that SIGSEGV message is always the same even with the same code.

Valgrind throwing unexplainable error in thread

I'm having trouble identifying why valgrind is throwing this error:
==82185== Thread 2:
==82185== Use of uninitialised value of size 8
==82185== at 0x401B9A: proc_outconnection_thread (station.c:401)
==82185== by 0x4E3CDF4: start_thread (in /usr/lib64/libpthread-2.17.so)
==82185== by 0x51471AC: clone (in /usr/lib64/libc-2.17.so)
==82185==
the pass im sending is 'this'
==82185== Use of uninitialised value of size 8
==82185== at 0x401BCA: proc_outconnection_thread (station.c:403)
==82185== by 0x4E3CDF4: start_thread (in /usr/lib64/libpthread-2.17.so)
==82185== by 0x51471AC: clone (in /usr/lib64/libc-2.17.so)
==82185==
As a bit of background information, the program i'm trying to create in C simulates a train station that uses TCP connections as "trains". I'm trying to get the program to use threads in order to both listen for and try and connect to other stations (other instances of the program).
The problem seems to exist when passing an internal data struct to a thread creation function via an argument struct that contains a pointer to the internal data struct. This way each thread has a pointer to the programs internal data.
In my efforts of testing, the file is compiled with
gcc -pthread -g -o station station.c -Wall -pedantic -std=gnu99
To produce my error, begin an instance of station with valgrind ./station tom authf logfile 3329 127.0.1.1
and then begin another instance with valgrind ./station tim authf logfile 3328 127.0.1.1
Due to an if statement in main, the station name with tim will attempt to connect to tom, and tom will create a socket and listen for tims attempt to connect. The connection seems to be successful however for some reason I'm also unable to flush the connection to send anything between, which i have a feeling may be because of what Valgrind is telling me.
What's so strange is that when a thread is created for the connection on tom's instance, no errors in valgrind are thrown despite having a very similar procedure for creating the thread (the same arguments are passed through the argument pointer and the same assignments are made).
Could it be a false positive for tim's end, or am I doing something severely wrong here?

Your problem is passing pointer on local variable into thread function. Simplest workaround is declare this variable as static or global, but this is not good if there are several threads use that variable.
Its better to allocate needed memory size for structure, initialize and pass this into thread function:
ArgStruct *argStruct = malloc(sizeof(ArgStruct));
if(argStruct == NULL) {
fprintf(stderr, "Cant alloc memory!\n");
exit(98);
}
argStruct->internalStruct = internal;
argStruct->clientCon = fdopen(fd, "r+");
pthread_create(&threadId, NULL, proc_outconnection_thread, (void *)argStruct);
Also, don't forget to free this memory (at the end of proc_outconnection_thread() for example).

Track the value of your internal data structure back to where it comes from and you will see that it originates from a struct object that is not initialized. You later assign values to some of the fields, but not to all.
Always initialize struct objects, and at the same time watch that you have a convention that makes it clear what default initialization (as if done with 0) means for the type.
If, one day, you really have a performance bottleneck because your compiler doesn't optimize an unused initialization, think of it again and do it differently. Here, because you are launching threads and do other complicated stuff, the difference will never be measurable.

Bug in OS X 10.5 malloc?

I'm writing a program in C. I have two main development machines, both Macs. One is running OS X 10.5 and is a 32bit machine, the other is running OS X 10.6 and is 64 bits. The program works fine when compiled and run on the 64bit machine. However, when I compile the exact same program on the 32bit machine it runs for a while and then crashes somewhere inside malloc. Here's the backtrace:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xeeb40fe0
0x9036d598 in small_malloc_from_free_list ()
(gdb) bt
#0 0x9036d598 in small_malloc_from_free_list ()
#1 0x90365286 in szone_malloc ()
#2 0x903650b8 in malloc_zone_malloc ()
#3 0x9036504c in malloc ()
#4 0x0000b14c in xmalloc (s=2048) at Common.h:185
...
xmalloc is my custom wrapper which just calls exit if malloc returns NULL, so it's not running out of memory.
If I link the same code with -ltcmalloc it works fine, so I strongly suspect that it's a bug somewhere inside OS X 10.5's default allocator. It may be that my program is causing some memory corruption somewhere and that tcmalloc somehow doesn't get tripped up by it. I tried to reproduce the failure by doing the same sequence of mallocs and frees in a different program but that worked fine.
So my questions are:
Has anyone seen this bug before? Or, alternatively
How can I debug something like this? E.g., is there a debug version of OS X's malloc?
BTW, these are the linked libraries:
$ otool -L ./interp
./interp:
/usr/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current version 1.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 111.1.5)
Update: Yeah, it's heap corruption due to writing past the end off an array, it's working now. I should have run valgrind before posting the question. I was nevertheless interested in techniques (other than valgrind) how to protect from such kind of corruption, so thanks for that.

Have you read the manual page for malloc() on MacOS X? In part, it says:
DEBUGGING ALLOCATION ERRORS
A number of facilities are provided to aid in debugging allocation errors in applications. These
facilities are primarily controlled via environment variables. The recognized environment variables
and their meanings are documented below.
ENVIRONMENT
The following environment variables change the behavior of the allocation-related functions.
MallocLogFile <f>
Create/append messages to the given file path instead of writing to
the standard error.
MallocGuardEdges
If set, add a guard page before and after each large block.
MallocDoNotProtectPrelude
If set, do not add a guard page before large blocks, even if the
MallocGuardEdges environment variable is set.
MallocDoNotProtectPostlude
If set, do not add a guard page after large blocks, even if the
MallocGuardEdges environment variable is set.
MallocStackLogging
If set, record all stacks, so that tools like leaks can be used.
MallocStackLoggingNoCompact
If set, record all stacks in a manner that is compatible with the
malloc_history program.
MallocStackLoggingDirectory
If set, records stack logs to the directory specified instead of saving
them to the default location (/tmp).
MallocScribble
If set, fill memory that has been allocated with 0xaa bytes. This
increases the likelihood that a program making assumptions about the contents of freshly allocated memory will fail. Also if set, fill memory
that has been deallocated with 0x55 bytes. This increases the likelihood
that a program will fail due to accessing memory that is no longer allocated.
MallocCheckHeapStart <s>
If set, specifies the number of allocations <s> to wait before begining
periodic heap checks every <n> as specified by MallocCheckHeapEach. If
MallocCheckHeapStart is set but MallocCheckHeapEach is not specified, the
default check repetition is 1000.
MallocCheckHeapEach <n>
If set, run a consistency check on the heap every <n> operations.
MallocCheckHeapEach is only meaningful if MallocCheckHeapStart is also
set.
MallocCheckHeapSleep <t>
Sets the number of seconds to sleep (waiting for a debugger to attach)
when MallocCheckHeapStart is set and a heap corruption is detected. The
default is 100 seconds. Setting this to zero means not to sleep at all.
Setting this to a negative number means to sleep (for the positive number
of seconds) only the very first time a heap corruption is detected.
MallocCheckHeapAbort <b>
When MallocCheckHeapStart is set and this is set to a non-zero value,
causes abort(3) to be called if a heap corruption is detected, instead of
any sleeping.
MallocErrorAbort
If set, causes abort(3) to be called if an error was encountered in
malloc(3) or free(3) , such as a calling free(3) on a pointer previously
freed.
MallocCorruptionAbort
Similar to MallocErrorAbort but will not abort in out of memory conditions, making it more useful to catch only those errors which will cause
memory corruption. MallocCorruptionAbort is always set on 64-bit processes.
That said, I'd still use valgrind first.

Has anyone seen this bug before
Yes, this is common programming bug and is almost certainly in your code. See http://www.efnetcpp.org/wiki/Heap_Corruption
How can I debug something like this?
See the Tools section of the above link.

How to detect the point of a stack overflow

I have the following problem with my C program: Somewhere is a stack overflow. Despite compiling without optimization and with debugger symbols, the program exits with this output (within or outside of gdb on Linux):
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
The only way I could detect that this actually is stack overflow was running the program through valgrind. Is there any way I can somehow force the operating system to dump a call stack trace which would help me locate the problem?
Sadly, gdb does not allow me to easily tap into the program either.

If you allow the system to dump core files you can analyze them with gdb:
$ ulimit -c unlimited # bash sentence to allow for infinite sized cores
$ ./stack_overflow
Segmentation fault (core dumped)
$ gdb -c core stack_overflow
gdb> bt
#0 0x0000000000400570 in f ()
#1 0x0000000000400570 in f ()
#2 0x0000000000400570 in f ()
...
Some times I have seen a badly generated core file that had an incorrect stack trace, but in most cases the bt will yield a bunch of recursive calls to the same method.
The core file might have a different name that could include the process id, it depends on the default configuration of the kernel in your current system, but can be controlled with (run as root or with sudo):
$ sysctl kernel.core_uses_pid=1

With GCC you can try this:
-fstack-protector
Emit extra code to check for buffer overflows, such as stack smashing attacks. This is done by adding a guard variable to functions with vulnerable objects. This includes functions that call alloca, and functions with buffers larger than 8 bytes. The guards are initialized when a function is entered and then checked when the function exits. If a guard check fails, an error message is printed and the program exits.
-fstack-protector-all
Like -fstack-protector except that all functions are protected.
http://gcc.gnu.org/onlinedocs/gcc-4.3.3/gcc/Optimize-Options.html#Optimize-Options

When a program dies with SIGSEGV, it normally dumps core on Unix. Could you load that core into debugger and check the state of the stack?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight