Debugging segfault with no apparent cause in gdb? - c

gdb was reporting that my C code was crashing somewhere in malloc(), so I linked my code with Electric Fence to pinpoint the actual source of the memory error. Now my code is segfaulting much earlier, but gdb's output is even more confusing:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x30026b00 (LWP 4003)]
0x10007c30 in simulated_status (axis=1, F=0x300e7fa8, B=0x1003a520, A=0x3013b000, p=0x1003b258, XS=0x3013b000)
at ccp_gch.c:799
EDIT: The full backtrace:
(gdb) bt
#0 0x10007c30 in simulated_status (axis=1, F=0x300e7fa8, B=0x1003a520, A=0x3013b000, p=0x1003b258, XS=0x3013b000)
at ccp_gch.c:799
#1 0x10007df8 in execute_QUERY (F=0x300e7fa8, B=0x1003a520, iData=0x7fb615c0) at ccp_gch.c:836
#2 0x10009680 in execute_DATA_cmd (P=0x300e7fa8, B=0x7fb615cc, R_type=0x7fb615d0, iData=0x7fb615c0)
at ccp_gch.c:1581
#3 0x10015bd8 in do_volley (client=13) at session.c:76
#4 0x10015ef4 in do_dialogue (v=12, port=2007) at session.c:149
#5 0x10016350 in do_session (starting_port=2007, ports=1) at session.c:245
#6 0x100056e4 in main (argc=2, argv=0x7fb618f4) at main.c:271
The relevant code (slightly modified due to reasons):
796 static uint32_t simulated_status(
797 unsigned axis, struct foo *F, struct bar *B, struct Axis *A, BAZ *p, uint64_t *XS)
798 {
799 uint32_t result = A->status;
800 *XS = get_status(axis);
801 if (!some_function(p)) {
802 ...
The obvious thing to check would be whether A->status is valid memory, but it is. Removing the assignment pushes the segfault to line 800, and removing that assignment causes some other assignment in the if-block to segfault. It looks as though either accessing an argument passed to the function or writing to a local variable is what's causing the segfault, but everything points to valid memory according to gdb.
How am I to interpret this? I've never seen anything like this before, so any suggestions / pointers in the right direction would be appreciated. I'm using GNU gdb 6.8-debian, Electric Fence 2.1, and running on a PowerPC 405 (uname reports Linux powerpmac 2.6.30.3 #24 [...] ppc GNU/Linux).

I'm guessing, but your symptoms are similar to what could happen in a stack overflow situation. The -fstack-protector suggestion in the comments is on the right track here. I'd recommend adding the -fstack-check option as well.
If the SEGV is occurring because of writes to the guard page protecting the stack then an info registers and info frame in gdb would help confirm if this is the case.

Related

I was expecting segmentation fault or some kind of out of bound exception but did not get it when using command line arguments in a C program

I am learning C programming from "Learn c the hard way by Zed Shaw". He asks the learner to try and break their own code.
So I tried the following C code and thought printing more values that I gave argv will break it but it did not until later.
#include<stdio.h>
int main(int argc, char *argv[])
{
int i = 0;
printf("This is argc: %d\n",argc);
printf("This is argv[argc]: %s\n",argv[argc]);
printf("This is argv[0]: %s\n",argv[0]);
for(i=argc;i<100;i++)
printf("This is argv[%d]: %s\n",i,argv[i]);
for(i=1;i<argc;i++)
{
printf("arg %d: %s\n",i,argv[i]);
}
return 0;
}
When I try to print argv upto 100:
I see the following when I was expecting some kind of out of bound or segmentation fault.
./exp10_so These are cmd args
This is argc: 5
This is argv[argc]: (null)
This is argv[0]: ./exp10_so
This is argv[5]: (null)
This is argv[6]: TERMINATOR_DBUS_NAME=net.tenshu.Terminator21a9d5db22c73a993ff0b42f64b396873
This is argv[7]: GTK_RC_FILES=/etc/gtk/gtkrc:/home/ab/.gtkrc:/home/ab/.config/gtkrc
This is argv[8]: _=/home/ab/Projects/learn_c_the_hard_way/./exp10_so
This is argv[9]: LANG=en_IN
This is argv[10]: GTK3_MODULES=xapp-gtk3-module
This is argv[11]: XDG_CURRENT_DESKTOP=KDE
This is argv[12]: QT_LINUX_ACCESSIBILITY_ALWAYS_ON=1
This is argv[13]: LC_IDENTIFICATION=en_IN
This is argv[14]: XCURSOR_THEME=breeze_cursors
This is argv[15]: XDG_SESSION_CLASS=user
This is argv[16]: XDG_SESSION_TYPE=x11
This is argv[17]: SHLVL=1
This is argv[18]: TERMINATOR_UUID=urn:uuid:4496f24b-8a64-43af-ab5a-03fc7e722242
This is argv[19]: DESKTOP_SESSION=plasma
This is argv[20]: LC_MEASUREMENT=en_IN
This is argv[21]: OLDPWD=/home/ab/Projects
This is argv[22]: HOME=/home/ab
This is argv[23]: KDE_SESSION_VERSION=5
This is argv[24]: USER=ab
This is argv[25]: TERMINATOR_DBUS_PATH=/net/tenshu/Terminator2
This is argv[26]: SESSION_MANAGER=local/tgh:#/tmp/.ICE-unix/2372,unix/tgh:/tmp/.ICE-unix/2372
This is argv[27]: XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session1
This is argv[28]: DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
This is argv[29]: XDG_VTNR=1
This is argv[30]: XDG_SEAT=seat0
This is argv[31]: LC_NUMERIC=en_IN
This is argv[32]: BROWSER=/usr/bin/firefox
This is argv[33]: GTK_MODULES=canberra-gtk-module
This is argv[34]: XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0
This is argv[35]: XDG_DATA_DIRS=/home/ab/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share:/var/lib/snapd/desktop
This is argv[36]: XDG_SESSION_DESKTOP=KDE
This is argv[37]: VTE_VERSION=6401
This is argv[38]: KDE_SESSION_UID=1000
This is argv[39]: LC_TIME=en_IN
This is argv[40]: MAIL=/var/spool/mail/ab
This is argv[41]: LOGNAME=ab
This is argv[42]: QT_AUTO_SCREEN_SCALE_FACTOR=0
This is argv[43]: LC_PAPER=en_IN
This is argv[44]: PATH=/usr/local/nginx/sbin:/home/ab/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/var/lib/snapd/snap/bin
This is argv[45]: QT_SCREEN_SCALE_FACTORS=LVDS1=1;DP1=1;HDMI1=1;VGA1=1;VIRTUAL1=1;
This is argv[46]: XDG_RUNTIME_DIR=/run/user/1000
This is argv[47]: SHELL=/bin/zsh
This is argv[48]: XDG_SESSION_ID=2
This is argv[49]: LC_MONETARY=en_IN
This is argv[50]: GTK2_RC_FILES=/etc/gtk-2.0/gtkrc:/home/ab/.gtkrc-2.0:/home/ab/.config/gtkrc-2.0
This is argv[51]: LC_TELEPHONE=en_IN
This is argv[52]: EDITOR=/usr/bin/nano
This is argv[53]: COLORTERM=truecolor
This is argv[54]: MOTD_SHOWN=pam
This is argv[55]: KDE_APPLICATIONS_AS_SCOPE=1
This is argv[56]: PAM_KWALLET5_LOGIN=/run/user/1000/kwallet5.socket
This is argv[57]: KDE_FULL_SESSION=true
This is argv[58]: XAUTHORITY=/home/ab/.Xauthority
This is argv[59]: LC_NAME=en_IN
This is argv[60]: DISPLAY=:0
This is argv[61]: LC_ADDRESS=en_IN
This is argv[62]: PWD=/home/ab/Projects/learn_c_the_hard_way
This is argv[63]: XCURSOR_SIZE=24
This is argv[64]: TERM=xterm-256color
This is argv[65]: ZSH=/home/ab/.oh-my-zsh
This is argv[66]: PAGER=less
This is argv[67]: LESS=-R
This is argv[68]: LSCOLORS=Gxfxcxdxbxegedabagacad
This is argv[69]: LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
This is argv[70]: LD_LIBRARY_PATH=/usr/local/lib
This is argv[71]: (null)
AddressSanitizer:DEADLYSIGNAL
=================================================================
==69851==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000021 (pc 0x7f3c30d7b4c6 bp 0x7ffe273b2ba0 sp 0x7ffe273b22e8 T0)
==69851==The signal is caused by a READ memory access.
==69851==Hint: address points to the zero page.
#0 0x7f3c30d7b4c6 in __sanitizer::internal_strlen(char const*) /build/gcc/src/gcc/libsanitizer/sanitizer_common/sanitizer_libc.cpp:167
#1 0x7f3c30d0d057 in printf_common /build/gcc/src/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors_format.inc:545
#2 0x7f3c30d0d41c in __interceptor_vprintf /build/gcc/src/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:1639
#3 0x7f3c30d0d517 in __interceptor_printf /build/gcc/src/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:1697
#4 0x562c5e03f290 in main /home/ab/Projects/learn_c_the_hard_way/exp10_so.c:13
#5 0x7f3c30b0ab24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
#6 0x562c5e03f0bd in _start (/home/ab/Projects/learn_c_the_hard_way/exp10_so+0x10bd)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /build/gcc/src/gcc/libsanitizer/sanitizer_common/sanitizer_libc.cpp:167 in __sanitizer::internal_strlen(char const*)
==69851==ABORTING
A segmentation fault happens when the code try to access a memory region that is not available.
Accessing an array out of bounds doesn't means that the memory before or after the area occupied by the array is not available: The compiler or the runtime usually put all varibales or data in general in a given block of memory. If your array is the last item of such a memory block, the accessing it with a to big index will produce a Segmentaion Fault but is the array is in the middle of the memory block, you will just access memory used for other data, giving unexpected result and undefined behavior.
If the array (In may example, but valid for anything) is written, accessing available memory will not produce a segmentation fault but will overwrite something else. It may produce unexpected results or crash or segmentation fault later! This kind of bug is frequently very difficult to find because the unexpected result/behavior looks completely independent of the root cause.

Intel SIMD instructions causing segmentation faults [duplicate]

How to deal with SIGSEGV, Segmentation fault. while using Avx2 (_mm256_load_pd)(_mm256_store_pd)
(solved)
_mm256_load_pd
I've received segmentation fault wile called
_mm256_load_pd
usage are as blew
double * Val = malloc(sizeof(double)*4);
__m256d vecv = _mm256_load_pd(&Val[0]);
gdb shows
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7fc5017 in _mm256_load_pd (__P=0x555555559370)
at /usr/lib/gcc/x86_64-linux-gnu/9/include/avxintrin.h:862
862 return *(__m256d *)__P;
(gdb) frame 1
#1 gemv_d_lineProduct_4_avx2 (Val=0x555555559370, indx=0x5555555592f0,
Vector_X=0x5555555592c0, Vector_Y=0x555555559340)
at someThing.c:114
114 __m256d vecv = _mm256_load_pd(&Val[0]);
(gdb)
_mm256_store_pd
while I make Val bigger
double * Val = malloc(sizeof(double)*4);
I found _mm256_load_pd works rightly but result in
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7fc50e3 in _mm256_store_pd (__A=..., __P=0x555555559390)
at /usr/lib/gcc/x86_64-linux-gnu/9/include/avxintrin.h:868
868 *(__m256d *)__P = __A;
(gdb) frame 1
#1 gemv_d_lineProduct_4_avx2 (Val=0x5555555593e0, indx=0x555555559310,
Vector_X=0x5555555592c0, Vector_Y=0x555555559390)
at something.c:122
122 _mm256_store_pd(Vector_Y,vecY);
full project
https://github.com/DevilInChina/gemv
mkdir build;cd build
cmake ..
make
cd ../bin
./line
#then might get some seg fault
Method of solving
change memory allocate function to
void *aligned_alloc (size_t __alignment, size_t __size);
first parameter should be 1024 or something else.
Thanks to igor-r
According to the Intel reference, _mm256_load_pd() requires 32-byte aligned pointer.
Please, use aligned_alloc() to allocate a memory chunk having the proper alignment.

Fixing AddressSanitizer: strcpy-param-overlap with memmove?

I am poking around in an old & quite buggy C program. When compiled with gcc -fsanitize=address I got this error while running the program itself:
==635==ERROR: AddressSanitizer: strcpy-param-overlap: memory ranges [0x7f37e8cfd5b5,0x7f37e8cfd5b8) and [0x7f37e8cfd5b5, 0x7f37e8cfd5b8) overlap
#0 0x7f390c3a8552 in __interceptor_strcpy /build/gcc/src/gcc/libsanitizer/asan/asan_interceptors.cc:429
#1 0x56488e5c1a08 in backupExon src/BackupGenes.c:72
#2 0x56488e5c2df1 in backupGene src/BackupGenes.c:134
#3 0x56488e5c426e in BackupArrayD src/BackupGenes.c:227
#4 0x56488e5c0bb1 in main src/geneid.c:583
#5 0x7f390b6bfee2 in __libc_start_main (/usr/lib/libc.so.6+0x26ee2)
#6 0x56488e5bf46d in _start (/home/darked89/proj_soft/geneidc/crg_github/geneidc/bin/geneid+0x1c46d)
0x7f37e8cfd5b5 is located 3874229 bytes inside of 37337552-byte region [0x7f37e894b800,0x7f37eace71d0)
allocated by thread T0 here:
#0 0x7f390c41bce8 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cc:153
#1 0x56488e618728 in RequestMemoryDumpster src/RequestMemory.c:801
#2 0x56488e5bfcea in main src/geneid.c:305
#3 0x7f390b6bfee2 in __libc_start_main (/usr/lib/libc.so.6+0x26ee2)
The error was caused by this line:
/* backupExon src/BackupGenes.c:65 */
strcpy(d->dumpSites[d->ndumpSites].subtype, E->Acceptor->subtype);
I have replaced it with:
memmove(d->dumpSites[d->ndumpSites].subtype, E->Acceptor->subtype,
strlen(d->dumpSites[d->ndumpSites].subtype));
The error went away and the program output produced with 2 different data inputs is identical to the results obtained before the change. BTW, more of strcpy bugs remain further down in the source. I need a confirmation that this is the way to fix it.
The issue & the rest of the code is here:
https://github.com/darked89/geneidc/issues/2
Assuming that E->Acceptor->subtype is at least as long as d->dumpSites[d->ndumpSites].subtype then there's no problem. You might want to check that first if you didn't already. Actually, you need a +1 to also copy the string terminator (\0), thanks #R.. for spotting it.
Your previous code was making a different assumption: it was assuming that d->dumpSites[d->ndumpSites].subtype was at least as long as E->Acceptor->subtype (the opposite basically).
The real equivalent would be:
memmove(
d->dumpSites[d->ndumpSites].subtype,
E->Acceptor->subtype,
strlen(E->Acceptor->subtype) + 1
);
This is the correct way to fix the code to allow overlapping.

Modified stack in multi-threaded case

We're loading a symbol from a shared library via dlsym() under GNU/Linux and obviously get some kind of race condition resulting in a segmentation fault. The backtrace looks something like this:
(gdb) backtrace
#0 do_lookup_x at dl-lookup.c:366
#1 _dl_lookup_symbol_x at dl-lookup.c:829
#2 do_sym at dl-sym.c:168
#3 _dl_sym at dl-sym.c:273
#4 dlsym_doit at dlsym.c:50
#5 _dl_catch_error at dl-error.c:187
#6 _dlerror_run at dlerror.c:163
#7 __dlsym at dlsym.c:70
#8 ... (our code)
My local machine uses glibc-2.23.
I discovered, that the library handle given to __dlsym() in frame #7 is different to the handle passed to _dlerror_run(). It runs wild in the following lines in dlsym.c:
void *
__dlsym (void *handle, const char *name DL_CALLER_DECL)
{
# ifdef SHARED
if (__glibc_unlikely (_dlfcn_hook != NULL))
return _dlfcn_hook->dlsym (handle, name, DL_CALLER);
# endif
struct dlsym_args args;
args.who = DL_CALLER;
args.handle = handle; /* <------------------ this isn't my handle! */
args.name = name;
/* Protect against concurrent loads and unloads. */
__rtld_lock_lock_recursive (GL(dl_load_lock));
void *result = (_dlerror_run (dlsym_doit, &args) ? NULL : args.sym);
__rtld_lock_unlock_recursive (GL(dl_load_lock));
return result;
}
GDB says
(gdb) frame 7
#7 __dlsym at dlsym.c:70
(gdb) p *(struct link_map *)args.handle
$36 = {l_addr= 140736951484536, l_name = 0x7fffe0000078 "\300\215\r\340\377\177", ...}
so this is obviously garbage. The same occurs in the higher frames, e.g. in frame #2:
(gdb) frame 2
#2 do_sym at dl-sym.c:168
(gdb) p handle
$38 = {l_addr= 140736951484536, l_name = 0x7fffe0000078 "\300\215\r\340\377\177", ...}
Unfortunately the parameter handle in frame #7 can't be displayed:
(gdb) p handle
$37 = <optimized out>
but surprisingly in frame #8 and further down in our code the handle was correct:
(gdb) frame 8
#8 ...
(gdb) p *(struct link_map *)libHandle
$38 = {l_addr = 140737160646656, l_name = 0x7fffd8005b60 "/path/to/libfoo.so", ...}
Now my conclusion is, that the variable args must be modified during the execution inside __dlsym() but I can't see where and why.
I have to confess, there's a second aspect to this problem: It only occurs in a multi-threaded environment and only sometimes. But as you can see, there are some counter measures for race conditions in the implementation of __dlsym() since they're calling __rtld_lock_(un)lock_recursive() and the local variable args isn't shared across threads. And curiously enough, the problem still persists, if I make frame #8 mutual exclusive among my threads.
Questions: What are possible sources for the discrepancy in the library handle between frame #8 and frame #7?
Question 2: Does dlopen() yield different values for different threads? Or to put it differently: Is it possible to share the handles returned by dlopen() between different threads.
Update: I thank everybody commenting on this question and trying to answer it despite the lack of almost any viable information to do so. I found the solution of this problem. As foreseen by the commenters, it was totaly unrelated to the stacktraces and other information I provided. Hence, I consider this question as closed and will flag it for deletion. So Long, and Thanks for All the Fish
What are possible sources for the discrepancy in the library handle between frame #8 and frame #7?
The most likely cause is mismatch between ld-linux.so and libdl.so. As stated in this answer, ld-linux and libdl must come from the same build of GLIBC, or bad things will happen.
The mismatch can come from (A) trying to point to a different libc build via LD_LIBRARY_PATH, or (B) by static linking of libdl.a into the program.
The (gdb) info shared should show you which libraries are currently loaded. If you see something other than installed system ld-linux and libdl, then (A) is likely your problem.
For (B), you probably got (and ignored) a linker warning to the effect that your program will require at runtime the same libc version that you used to link it. Contrary to popular belief, fully-static binaries are less portable on Linux, not more.

Strange stack overflow?

I'm running into a strange situation with passing a pointer to a structure with a very large array defined in the struct{} definition, a float array around 34MB in size. In a nutshell, the psuedo-code looks like this:
typedef config_t{
...
float values[64000][64];
} CONFIG;
int32_t Create_Structures(CONFIG **the_config)
{
CONFIG *local_config;
int32_t number_nodes;
number_nodes = Find_Nodes();
local_config = (CONFIG *)calloc(number_nodes,sizeof(CONFIG));
*the_config = local_config;
return(number_nodes);
}
int32_t Read_Config_File(CONFIG *the_config)
{
/* do init work here */
return(SUCCESS);
}
main()
{
CONFIG *the_config;
int32_t number_nodes,rc;
number_nodes = Create_Structures(&the_config);
rc = Read_Config_File(the_config);
...
exit(0);
}
The code compiles fine, but when I try to run it, I'll get a SIGSEGV at the { beneath Read_Config_File().
(gdb) run
...
Program received signal SIGSEGV, Segmentation fault.
0x0000000000407d0a in Read_Config_File (the_config=Cannot access memory at address 0x7ffffdf45428
) at ../src/config_parsing.c:763
763 {
(gdb) bt
#0 0x0000000000407d0a in Read_Config_File (the_config=Cannot access memory at address 0x7ffffdf45428
) at ../src/config_parsing.c:763
#1 0x00000000004068d2 in main (argc=1, argv=0x7fffffffe448) at ../src/main.c:148
I've done this sort of thing all the time, with smaller arrays. And strangely, 0x7fffffffe448 - 0x7ffffdf45428 = 0x20B8EF8, or about the 34MB of my float array.
Valgrind will give me similar output:
==10894== Warning: client switching stacks? SP change: 0x7ff000290 --> 0x7fcf47398
==10894== to suppress, use: --max-stackframe=34311928 or greater
==10894== Invalid write of size 8
==10894== at 0x407D0A: Read_Config_File (config_parsing.c:763)
==10894== by 0x4068D1: main (main.c:148)
==10894== Address 0x7fcf47398 is on thread 1's stack
The error messages all point to me clobbering the stack pointer, but a) I've never run across one that crashes on entry of the function and b) I'm passing pointers around, not the actual array.
Can someone help me out with this? I'm on a 64-bit CentOS box running kernel 2.6.18 and gcc 4.1.2
Thanks!
Matt
You've blown up the stack by allocating one of these huge config_t structs onto it. The two stack pointers on evidence in the gdb output, 0x7fffffffe448 and 0x7ffffdf45428, are very suggestive of this.
$ gdb
GNU gdb 6.3.50-20050815 ...blahblahblah...
(gdb) p 0x7fffffffe448 - 0x7ffffdf45428
$1 = 34312224
There's your ~34MB constant that matches the size of the config_t struct. Systems don't give you that much stack space by default, so either move the object off the stack or increase your stack space.
The short answer is that there must be a config_t declared as a local variable somewhere, which would put it on the stack. Probably a typo: missing * after a CONFIG declaration somewhere.

Resources