When I run my program, I get segmentation fault, so I decided to check it through Valgrind. When I did, I got the following message from Valgrind. And I get the error when I use the code described here. Any idea what is going on here?
==21471== Invalid write of size 8
==21471== at 0x4802511: _vgnU_freeres (vg_preloaded.c:64)
==21471== by 0x38A715397F: ???
==21471== by 0x38A6E4D549: printf (in /lib64/libc-2.5.so)
==21471== by 0x401D52: call_func(int) (replication.cpp:752)
==21471== by 0x6137C7: ???
==21471== by 0x40621C: AdvanceFramesMT(void*) (pthreads.cpp:1020)
==21471== by 0x38A7A0673C: start_thread (in /lib64/libpthread-2.5.so)
==21471== by 0x38A6ED44BC: clone (in /lib64/libc-2.5.so)
==21471== Address 0x612ba8 is 14216 bytes inside data symbol "func_stack"
Code
static char func_stack[16384];
static ucontext_t uctx_main[16], uctx_func[16];
void call_func( int n )
{
printf( "Message %d!", n );
}
if (getcontext(&uctx_func[tid]) == -1)
handle_error("getcontext");
uctx_func[tid].uc_stack.ss_sp = func_stack;
uctx_func[tid].uc_stack.ss_size = sizeof(func_stack);
uctx_func[tid].uc_link = &uctx_main[tid];
makecontext(&uctx_func[tid], (void(*)())call_func, 1, 2);
if (swapcontext(&uctx_main[tid], &uctx_func[tid]) == -1)
handle_error("swapcontext");
Try to improve Valgrind stack traces - this will hopefully help to understand the problem. Are you using -fomit-frame-pointer or -fstack-check gcc options? This can make Valgrind stack traces worse (with ??? symbols instead of names) Valgrind FAQ.
OK, I got it now. Actually I was using this for multiple threads. That is why uctx_main[16] and uctx_func[16] are arrays. However, I forgot to make func_stack also an array (a 2-dimensional array in fact). So I changed it to char func_stack[16][16384] and it solved the problem.
Related
I am learning C programming from "Learn c the hard way by Zed Shaw". He asks the learner to try and break their own code.
So I tried the following C code and thought printing more values that I gave argv will break it but it did not until later.
#include<stdio.h>
int main(int argc, char *argv[])
{
int i = 0;
printf("This is argc: %d\n",argc);
printf("This is argv[argc]: %s\n",argv[argc]);
printf("This is argv[0]: %s\n",argv[0]);
for(i=argc;i<100;i++)
printf("This is argv[%d]: %s\n",i,argv[i]);
for(i=1;i<argc;i++)
{
printf("arg %d: %s\n",i,argv[i]);
}
return 0;
}
When I try to print argv upto 100:
I see the following when I was expecting some kind of out of bound or segmentation fault.
./exp10_so These are cmd args
This is argc: 5
This is argv[argc]: (null)
This is argv[0]: ./exp10_so
This is argv[5]: (null)
This is argv[6]: TERMINATOR_DBUS_NAME=net.tenshu.Terminator21a9d5db22c73a993ff0b42f64b396873
This is argv[7]: GTK_RC_FILES=/etc/gtk/gtkrc:/home/ab/.gtkrc:/home/ab/.config/gtkrc
This is argv[8]: _=/home/ab/Projects/learn_c_the_hard_way/./exp10_so
This is argv[9]: LANG=en_IN
This is argv[10]: GTK3_MODULES=xapp-gtk3-module
This is argv[11]: XDG_CURRENT_DESKTOP=KDE
This is argv[12]: QT_LINUX_ACCESSIBILITY_ALWAYS_ON=1
This is argv[13]: LC_IDENTIFICATION=en_IN
This is argv[14]: XCURSOR_THEME=breeze_cursors
This is argv[15]: XDG_SESSION_CLASS=user
This is argv[16]: XDG_SESSION_TYPE=x11
This is argv[17]: SHLVL=1
This is argv[18]: TERMINATOR_UUID=urn:uuid:4496f24b-8a64-43af-ab5a-03fc7e722242
This is argv[19]: DESKTOP_SESSION=plasma
This is argv[20]: LC_MEASUREMENT=en_IN
This is argv[21]: OLDPWD=/home/ab/Projects
This is argv[22]: HOME=/home/ab
This is argv[23]: KDE_SESSION_VERSION=5
This is argv[24]: USER=ab
This is argv[25]: TERMINATOR_DBUS_PATH=/net/tenshu/Terminator2
This is argv[26]: SESSION_MANAGER=local/tgh:#/tmp/.ICE-unix/2372,unix/tgh:/tmp/.ICE-unix/2372
This is argv[27]: XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session1
This is argv[28]: DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
This is argv[29]: XDG_VTNR=1
This is argv[30]: XDG_SEAT=seat0
This is argv[31]: LC_NUMERIC=en_IN
This is argv[32]: BROWSER=/usr/bin/firefox
This is argv[33]: GTK_MODULES=canberra-gtk-module
This is argv[34]: XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0
This is argv[35]: XDG_DATA_DIRS=/home/ab/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share:/var/lib/snapd/desktop
This is argv[36]: XDG_SESSION_DESKTOP=KDE
This is argv[37]: VTE_VERSION=6401
This is argv[38]: KDE_SESSION_UID=1000
This is argv[39]: LC_TIME=en_IN
This is argv[40]: MAIL=/var/spool/mail/ab
This is argv[41]: LOGNAME=ab
This is argv[42]: QT_AUTO_SCREEN_SCALE_FACTOR=0
This is argv[43]: LC_PAPER=en_IN
This is argv[44]: PATH=/usr/local/nginx/sbin:/home/ab/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/var/lib/snapd/snap/bin
This is argv[45]: QT_SCREEN_SCALE_FACTORS=LVDS1=1;DP1=1;HDMI1=1;VGA1=1;VIRTUAL1=1;
This is argv[46]: XDG_RUNTIME_DIR=/run/user/1000
This is argv[47]: SHELL=/bin/zsh
This is argv[48]: XDG_SESSION_ID=2
This is argv[49]: LC_MONETARY=en_IN
This is argv[50]: GTK2_RC_FILES=/etc/gtk-2.0/gtkrc:/home/ab/.gtkrc-2.0:/home/ab/.config/gtkrc-2.0
This is argv[51]: LC_TELEPHONE=en_IN
This is argv[52]: EDITOR=/usr/bin/nano
This is argv[53]: COLORTERM=truecolor
This is argv[54]: MOTD_SHOWN=pam
This is argv[55]: KDE_APPLICATIONS_AS_SCOPE=1
This is argv[56]: PAM_KWALLET5_LOGIN=/run/user/1000/kwallet5.socket
This is argv[57]: KDE_FULL_SESSION=true
This is argv[58]: XAUTHORITY=/home/ab/.Xauthority
This is argv[59]: LC_NAME=en_IN
This is argv[60]: DISPLAY=:0
This is argv[61]: LC_ADDRESS=en_IN
This is argv[62]: PWD=/home/ab/Projects/learn_c_the_hard_way
This is argv[63]: XCURSOR_SIZE=24
This is argv[64]: TERM=xterm-256color
This is argv[65]: ZSH=/home/ab/.oh-my-zsh
This is argv[66]: PAGER=less
This is argv[67]: LESS=-R
This is argv[68]: LSCOLORS=Gxfxcxdxbxegedabagacad
This is argv[69]: LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
This is argv[70]: LD_LIBRARY_PATH=/usr/local/lib
This is argv[71]: (null)
AddressSanitizer:DEADLYSIGNAL
=================================================================
==69851==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000021 (pc 0x7f3c30d7b4c6 bp 0x7ffe273b2ba0 sp 0x7ffe273b22e8 T0)
==69851==The signal is caused by a READ memory access.
==69851==Hint: address points to the zero page.
#0 0x7f3c30d7b4c6 in __sanitizer::internal_strlen(char const*) /build/gcc/src/gcc/libsanitizer/sanitizer_common/sanitizer_libc.cpp:167
#1 0x7f3c30d0d057 in printf_common /build/gcc/src/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors_format.inc:545
#2 0x7f3c30d0d41c in __interceptor_vprintf /build/gcc/src/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:1639
#3 0x7f3c30d0d517 in __interceptor_printf /build/gcc/src/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:1697
#4 0x562c5e03f290 in main /home/ab/Projects/learn_c_the_hard_way/exp10_so.c:13
#5 0x7f3c30b0ab24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
#6 0x562c5e03f0bd in _start (/home/ab/Projects/learn_c_the_hard_way/exp10_so+0x10bd)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /build/gcc/src/gcc/libsanitizer/sanitizer_common/sanitizer_libc.cpp:167 in __sanitizer::internal_strlen(char const*)
==69851==ABORTING
A segmentation fault happens when the code try to access a memory region that is not available.
Accessing an array out of bounds doesn't means that the memory before or after the area occupied by the array is not available: The compiler or the runtime usually put all varibales or data in general in a given block of memory. If your array is the last item of such a memory block, the accessing it with a to big index will produce a Segmentaion Fault but is the array is in the middle of the memory block, you will just access memory used for other data, giving unexpected result and undefined behavior.
If the array (In may example, but valid for anything) is written, accessing available memory will not produce a segmentation fault but will overwrite something else. It may produce unexpected results or crash or segmentation fault later! This kind of bug is frequently very difficult to find because the unexpected result/behavior looks completely independent of the root cause.
While working on my compiler I got this error:
Program received signal SIGSEGV, Segmentation fault.
__memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:33
How do I get details of what went wrong here? I know from the backtrace it's a memcpy line that causes it, but how do I see how the memory is aligned? And how do I know how it should be aligned?
The project is a compiler with an LLVM back-end using the Zend/PHP runtime with the OCaml garbage collector, so there's is a lot of things that can go wrong.
I suspect this line being part of the problem:
zend_string *str = (zend_string *)caml_alloc(ZEND_MM_ALIGNED_SIZE(_STR_HEADER_SIZE + len + 1), 0);
where caml_alloc were pemalloc in the Zend source-code.
The segfault happens when doing 10'000 string concatenations. This is the output from valgrind:
==7501== Invalid read of size 8
==7501== at 0x4C2F790: memcpy##GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==7501== by 0x4D7E58: subsetphp_concat_function (bindings.c:160)
==7501== by 0x4D7F52: foo (llvm_test.s:21)
==7501== by 0x4D7FA9: main (llvm_test.s:60)
==7501== Address 0x61db938 is 2,660,600 bytes inside a block of size 3,936,288 free'd
==7501== at 0x4C2BDEC: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==7501== by 0x4C2627: do_compaction (in /home/olle/kod/subsetphp/test)
==7501== by 0x4C2735: caml_compact_heap (in /home/olle/kod/subsetphp/test)
==7501== by 0x4D08DF: caml_major_collection_slice (in /home/olle/kod/subsetphp/test)
==7501== by 0x4D2DCF: caml_minor_collection (in /home/olle/kod/subsetphp/test)
==7501== by 0x4D2FBC: caml_check_urgent_gc (in /home/olle/kod/subsetphp/test)
==7501== by 0x4D7C45: subsetphp_string_alloc (bindings.c:90)
==7501== by 0x4D7CEE: subsetphp_string_init (bindings.c:122)
==7501== by 0x4D7DEA: subsetphp_concat_function (bindings.c:149)
==7501== by 0x4D7F52: foo (llvm_test.s:21)
==7501== by 0x4D7FA9: main (llvm_test.s:60)
Any tips appreciated.
Edit:
extern value subsetphp_concat_function(value v1, value v2)
{
CAMLparam2(v1, v2);
zend_string *str1 = Zend_string_val(v1);
zend_string *str2 = Zend_string_val(v2);
size_t str1_len = str1->len;
size_t str2_len = str2->len;
size_t result_len = str1_len + str2_len;
value result = subsetphp_string_init("", result_len, 1);
zend_string *zend_result = Zend_string_val(result);
if (str1_len > SIZE_MAX - str2_len) {
zend_error_noreturn(E_ERROR, "String size overflow");
}
memcpy(zend_result->val, str1->val, str1_len); // This is line 160
memcpy(zend_result->val + str1_len, str2->val, str2_len);
zend_result->len = result_len;
zend_result->val[result_len] = '\0';
CAMLreturn(result);
}
Edit 2:
Since valgrind gives me this line
Address 0x61db938 is 2,660,600 bytes inside a block of size 3,936,288 free'd
I guess I'm trying to copy something that has already been freed, meaning that I don't tell the OCaml GC correctly when something is no longer referenced.
This errors tells you that something bad happen during memcpy, probably something like a null pointer or error in the sizes.
Don't bother with __memcpy_sse2_unaligned, it is an implementation detail of memcpy. memcpy has a lot of different implementation optimized for the different cases and dispatch dynamically to the most efficient one given the context. That one seems to be used when sse2 instructions are available and pointers are not alligned to 16 bytes boundaries (sse2 instructions cannot load unaligned values), which is probably done by copying one byte at a time until a 16 byte boundary is reached then switching to the fast path.
As for the OCaml gc specific details linked with LLVM, you need to be quite carefull to how you handle heap pointers. As you don't tell whether you are using the gcroot mechanism or the new statepoints, I will suppose you are using gcroot.
Since the OCaml gc is a moving collector (moving from minor heap to major heap, and moving during compaction) every allocation can potentially invalidate a pointer. That means that it is usualy unsafe to factorise field access to heap allocated values. For instance this is unsafe:
v = field(0, x)
r = function_call(...)
w = field(0, v)
the function call could do some allocations that could trigger a compaction.
v = field(0, x)
r = function_call(...)
v' = field(0, x)
w = field(0, v')
By the way, I'm not even certain that the gcroot mechanism can correctly handle moving gc (that llvm doesn't optimize things it shouldn"t).
So that usualy means that it's not a good idea to use gcroot with OCaml's GC. The new way is better for that kind of GC, but you still need to be carefull not to access pointer across function calls or allocations.
So your error may be something linked to that kind of problem: the pointer was valid at some point, then a value was moved during compaction that resulted in some gc page being unused, hence freed.
I'm programming in C, and when I use Valgrind to check memory errors, the next error has shown:
==9756== Invalid write of size 4
==9756== at 0x40164D: main (flowTracker.c:294)
==9756== Address 0x24 is not stack'd, malloc'd or (recently) free'd
The line 294 of flowTracker.c is the next:
tabla_hash[clave_hash]->contador++;
And the declaration of tabla_hash is:
#define TAMANHO_TABLA 1048576
typedef struct{
int tiempo_ini;
int tiempo_ult;
uint8_t quintupla[13];
int num_bytes;
int num_SYN;
int num_ACK;
int contador;
double pack_s;
double bits_s;
} FlujoIP;
FlujoIP *tabla_hash[TAMANHO_TABLA];
As 4566976 pointed out, tabla_hash[clave_hash] is (probably) NULL. That's just a guess, as you haven't provided an MCVE which reproduces the issue without us having to fill in the blanks or fix compiler errors...
It seems to me as though you probably meant to declare tablahash like so: FlujoIP tabla_hash[TAMANHO_TABLA]; (though, wow! That's a huge array)... and you should then be able to change -> to . like so: tabla_hash[clave_hash].contador++;
Alternatively, if you were to precede the offending statement with if (tablahash[clave_hash] == NULL) { tablahash[clave_hash] = malloc(sizeof tablahash[clave_hash][0]); } or something, that might also be appropriate... Don't forget to free all of the items within your huge array.
This question has been asked several times, but since I think my situation is I think more specific:
I have a C program, which works perfectly on my OSX system (too huge to copy). I already tested it with Valgrind, and I am not missing any frees /mallocs /or writes, all problems are solved 100%.
When I now run the program over ssh on an external sever, when I run with not that many data (see code below, my_length < 1000), it works without any problem. But with a larger dataset, using the Linux terminal I get this error:
*** Error in `./a.out': free(): invalid next size (fast): 0x00000000016b9ed0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3e50475cff]
/lib64/libc.so.6[0x3e5047cff8]
./a.out[0x41083c]
./a.out[0x402374]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x3e50421d65]
./a.out[0x400e79]
======= Memory map: ========
00400000-00418000 r-xp 00000000 00:4d 89038
[...]
and when I run it on Solaris it complains :
malloc failed
at a line where I allocate a three dimensional array:
int ***A, ***B, ***C;
A = malloc(sizeof(int**)*2); B = malloc(sizeof(int**)*2); C = malloc(sizeof(int**)*2);
int i;
for (i = 0; i < 2; i++) {
A[i] = malloc(sizeof(int*)* my_length);
B[i] = malloc(sizeof(int*)* my_length);
C[i] = malloc(sizeof(int*)* my_length);
for (j = 0; j <= my_length2; j++) {
A[i][j] = malloc(sizeof(int)* my_length2);
B[i][j] = malloc(sizeof(int)* my_length2);
C[i][j] = malloc(sizeof(int)* my_length2);<== malloc failed here??
}
}
where my_length and my_length2 get really really huge!
I am getting desperate! Does someone have any clue what my problem could be?
There are so many duplicates found for this question that annoyingly, I cannot find the right one for you.
The basic problem is that your program has most definitely written over the memory block tracking information that the malloc/free library uses.
Somewhere in your program is a memory write that is out of bounds.
Ok I found one possible solution, I was increasing my values step by step, and now valgrind reports following:
==3954== Invalid write of size 8
==3954== at 0x344C1B: _platform_memmove$VARIANT$Unknown (in /usr/lib/system/libsystem_platform.dylib)
==3954== by 0x1C4D74: __memcpy_chk (in /usr/lib/system/libsystem_c.dylib)
==3954== by 0x10000B2E4: my_method (delete.c:1461)
==3954== by 0x1000025B3: main (delete.c:365)
==3954== Address 0x1020611a0 is 16 bytes after a block of size 2,096 alloc'd
==3954== at 0x56AA: realloc (vg_replace_malloc.c:698)
==3954== by 0x10000B21E: my_method (delete.c:1458)
==3954== by 0x1000025B3: main (delete.c:365)
And this is the code, because I have no idea why this appears:S
if (temp_length + strlen(new_substring)
> max_seq_lens[i]) {
max_len[i] *= 2;
my_array[i].name = realloc(sizeof(char)* max_seq_lens[i]); <===
}
temp_length += (some_num);
SO here temp_length is saving the current length of my my_array[i].name, I am trying to concatenate a new string (new_substring) and before I concatenate them, I tried to check if the memory is enough, I really don't see my mistake here :S
I am getting segmentation fault on line 8 in the code below.
typedef struct _my_struct {
int pArr[21];
int arr1[8191];
int arr2[8191];
int m;
int cLen;
int gArr[53];
int dArr[8191];
int data[4096];
int rArr[53];
int eArr[1024];
};
void *populate_data(void *arg) {
1 register int mask =1, iG;
2 struct _my_struct *var ;
3 var = arg; // arg is passed as initialized struct variable while creating thread
4 var->m = 13;
5 var->arr2[var->m] = 0;
6 for (iG = 0; iG < var->m; iG++) {
7 var->arr2[iG] = mask;
8 var->arr1[var->arr2[iG]] = iG;
9 if (var->pArr[iG] != 0) // pArr[]= 1011000000001
10 var->arr2[var->m] ^= mask;
11 mask <<= 1;
12 }
13 var->arr1[var->arr2[var->m]] = var->m;
14 mask >>= 1;
15 for (iG = var->m+ 1; iG < var->cLen; iG++) {
16 if (var->arr2[iG - 1] >= mask)
17 var->arr2[iG] = var->arr2[var->m] ^ ((var->arr2[iG- 1] ^ mask) << 1);
18 else
19 var->arr2[iG] = var->arr2[iG- 1] << 1;
20 var->arr1[var->arr2[iG]] = iG;
21 }
22 var->arr1[0] = -1;
}
Here is the thread function:
void main() {
unsigned int tid;
struct _my_struct *instance = NULL;
instance = (struct _my_struct *)malloc(sizeof(_my_struct ));
start_thread(&tid , 119312, populate_data, instance );
}
int
start_thread(unsigned int *tid, int stack_size, void * (*my_function)(void *), void *arg)
{
pthread_t ptid = -1;
pthread_attr_t pattrib;
pthread_attr_init(&pattrib);
if(stack_size > 0)
{
pthread_attr_setstacksize(&pattrib, stack_size);
}
else
{
pthread_attr_destroy(&pattrib);
return -1;
}
pthread_create(&ptid, &pattrib, my_function, arg);
pthread_attr_destroy(&pattrib);
return 0;
}
Once I debug it through gdb, got this error,
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffdfec80700 (LWP 22985)]
0x0000000000401034 in populate_data (arg=0x7fffffffe5d8) at Queue.c:19
19 var->arr1[var->arr2[iG]] = iG;
and its backtrace is:
#0 0x0000000000401034 in populate_data (arg=0x7fffffffe5d8) at Queue.c:159
#1 0x00007ffff7bc6971 in start_thread () from /lib/libpthread.so.0
#2 0x00007ffff792292d in clone () from /lib/libc.so.6
#3 0x0000000000000000 in ?? ()
However, I'm unable to correct the error.
Anyhelp is really appreciated.
Please show the calling code in start_thread.
It seems likely to be a stack and/or memory allocation error, the structure is pretty large (8 MB assuming 32-bit ints) and might well overflow some stack limit.
Even more possible is that it's gone out of scope, which is why the calling step must be shown.
I don't know if perhaps you've changed the names of the arrays in your _my_struct in order to hide the purpose of them (company confidential information, perhaps?), but if that's actually what you've named your arrays, I'm just going to suggest that you name them something that makes sense to you that when someone has to read your code 4 years from now, they'll have some hope of following your initialization loops & understanding what's going on. Same goes for your loop variable iG.
My next comment/question is, why are you firing off a thread to initialize this structure that's on the stack of the main thread? Which thread is going to be using this structure once it's initialized? Or are you going to make other threads that will use it? Do you have any mechanism (mutex? semaphore?) to ensure that the other threads won't start using the data until your initialization thread is done initializing it? Which sort of begs the question, why the heck are you bothering to fire off a separate thread to initialize it in the first place; you could just initialize it by calling populate_data() straight from main() and not even have to worry about synchronization because you wouldn't even be starting up any other threads until after it's done being initialized. If you're running on a multicore machine, you might get some small benefit from firing off that separate thread to do the initialization while main() goes on & does other stuff, but from the size of your struct (not tiny, but not huge either) it seems like that benefit would be very miniscule. And if you're running on a single core, you'll get no concurrency benefit at all; you'd just be wasting time firing off another thread to do it due to the context switching overhead; in a unicore environment you'd be better off just calling populate_data() directly from main().
Next comment is, your _my_struct is not huge, so it's not going to blow your stack by itself. But it ain't tiny either. If your app will always need only one copy of this struct, maybe you should make it a global variable or a file-scope variable, so it doesn't eat up stack space.
Finally, to your actual bug............
I didn't bother to try to decipher your cryptic looping code, but valgrind is telling me that you have some conditions that depend on uninitialized locations:
~/test/so$ valgrind a.out
==27663== Memcheck, a memory error detector
==27663== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==27663== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==27663== Command: a.out
==27663==
==27663== Thread 2:
==27663== Conditional jump or move depends on uninitialised value(s)
==27663== at 0x8048577: populate_data (so2.c:34)
==27663== by 0x593851: start_thread (in /lib/libpthread-2.5.so)
==27663== by 0x4BDA8D: clone (in /lib/libc-2.5.so)
==27663==
==27663== Conditional jump or move depends on uninitialised value(s)
==27663== at 0x804868A: populate_data (so2.c:40)
==27663== by 0x593851: start_thread (in /lib/libpthread-2.5.so)
==27663== by 0x4BDA8D: clone (in /lib/libc-2.5.so)
My so2.c line 34 corresponds with line 9 in your code posting above.
My so2.c line 40 corresponds with line 15 in your code posting above.
If I add the following at the top of populate_data(), these valgrind errors disappear:
memset(arg,0,sizeof(_my_struct_t));
(I modified your struct definition as follows:)
typedef struct _my_struct { int pArr[21]; ......... } _my_struct_t;
Now just because adding the memset() call makes the errors disappear doesn't necessarily mean that your loop logic is correct, it just means that now those locations are considered "initialized" by valgrind. If having all-zeros in those locations when your initialization loops begin is what your logic needs, then that should fix it. But you need to verify for yourself that such really is the proper solution.
BTW... someone suggested using calloc() to get a zeroed-out allocation (rather than using dirty stack space)... that would work too, but if you want populate_data() to be foolproof, you'll zero the memory in it and not in the caller, since (assuming you like your initialization logic as it is), populate_data() is the thing that depends on it being zeroed out, main() shouldn't have to care whether it is or not. Not a biggie either way.