Strange stack overflow? - c

I'm running into a strange situation with passing a pointer to a structure with a very large array defined in the struct{} definition, a float array around 34MB in size. In a nutshell, the psuedo-code looks like this:
typedef config_t{
...
float values[64000][64];
} CONFIG;
int32_t Create_Structures(CONFIG **the_config)
{
CONFIG *local_config;
int32_t number_nodes;
number_nodes = Find_Nodes();
local_config = (CONFIG *)calloc(number_nodes,sizeof(CONFIG));
*the_config = local_config;
return(number_nodes);
}
int32_t Read_Config_File(CONFIG *the_config)
{
/* do init work here */
return(SUCCESS);
}
main()
{
CONFIG *the_config;
int32_t number_nodes,rc;
number_nodes = Create_Structures(&the_config);
rc = Read_Config_File(the_config);
...
exit(0);
}
The code compiles fine, but when I try to run it, I'll get a SIGSEGV at the { beneath Read_Config_File().
(gdb) run
...
Program received signal SIGSEGV, Segmentation fault.
0x0000000000407d0a in Read_Config_File (the_config=Cannot access memory at address 0x7ffffdf45428
) at ../src/config_parsing.c:763
763 {
(gdb) bt
#0 0x0000000000407d0a in Read_Config_File (the_config=Cannot access memory at address 0x7ffffdf45428
) at ../src/config_parsing.c:763
#1 0x00000000004068d2 in main (argc=1, argv=0x7fffffffe448) at ../src/main.c:148
I've done this sort of thing all the time, with smaller arrays. And strangely, 0x7fffffffe448 - 0x7ffffdf45428 = 0x20B8EF8, or about the 34MB of my float array.
Valgrind will give me similar output:
==10894== Warning: client switching stacks? SP change: 0x7ff000290 --> 0x7fcf47398
==10894== to suppress, use: --max-stackframe=34311928 or greater
==10894== Invalid write of size 8
==10894== at 0x407D0A: Read_Config_File (config_parsing.c:763)
==10894== by 0x4068D1: main (main.c:148)
==10894== Address 0x7fcf47398 is on thread 1's stack
The error messages all point to me clobbering the stack pointer, but a) I've never run across one that crashes on entry of the function and b) I'm passing pointers around, not the actual array.
Can someone help me out with this? I'm on a 64-bit CentOS box running kernel 2.6.18 and gcc 4.1.2
Thanks!
Matt

You've blown up the stack by allocating one of these huge config_t structs onto it. The two stack pointers on evidence in the gdb output, 0x7fffffffe448 and 0x7ffffdf45428, are very suggestive of this.
$ gdb
GNU gdb 6.3.50-20050815 ...blahblahblah...
(gdb) p 0x7fffffffe448 - 0x7ffffdf45428
$1 = 34312224
There's your ~34MB constant that matches the size of the config_t struct. Systems don't give you that much stack space by default, so either move the object off the stack or increase your stack space.

The short answer is that there must be a config_t declared as a local variable somewhere, which would put it on the stack. Probably a typo: missing * after a CONFIG declaration somewhere.

Related

qemu: uncaught target signal 11 (Segmentation fault) - core dumped, when trying to return a struct

I just noticed that I am unable to have a function return a struct.
I am running this on ARM32/debian docker image with threads enabled.
This is the function that gives me the run time error:
struct CEC_call des_CEC_call(char * buffy){
char request = buffy[0]; // fails here
buffy+=4;
char obligation = buffy[1];
buffy+=4;
struct CEC_call ceccall;
pepcall.request = request;
pepcall.obligation = obligation;
return ceccall;
}
But if I change the return type to void, there is no issue in running:
void des_CEC_call(char * buffy){
char request = buffy[0]; // doesn't fail here
buffy+=4;
char obligation = buffy[1];
buffy+=4;
struct CEC_call ceccall;
pepcall.request = request;
pepcall.obligation = obligation;
}
Return works fine as well with any of the standard return types.
Header where the struct is defined is included in the file with the function although it will still crash even if the struct is defined in the same file. Not sure how to proceed with debugging, any help appreciated.
EDIT:
More details, based on suggestions from comments:
I have rerun the same program on my mac as well as some other non arm architectures with docker, and it runs without any noticeable issues. Some aspects relating to bit shifting are slightly different as expected but no run time error from the segmentation fault. I tried running it with various optimisation levels, but to no avail.
I have used GDB before so I thought that might provide some insight, sadly I have not been able to get it to work on this container.
I ensured GDB is installed and recompiled the binary with -0g.
I ran docker with --cap-add=SYS_PTRACE and --security-opt seccomp=unconfined.
Each time I got:
warning: Could not trace the inferior process.
Error:
warning: ptrace: Function not implemented
During startup program exited with code 127.
I am able to use GDB with other non-arm, non-32bit docker images without any issues. I think this is enough for another question, as I've spent ages trying to get GDB working with that environment.
I am not sure really how to verify otherwise, but I have printed out the address buffy is pointing and the value held by buffy[0] in the preceding functions as well as the problematic one.
Without struct return:
address of buffy = 0xff58b9ec
buffer[0] = ff
address of buffy = 0xff58b9ec
buffer[0] = ff
address of buffy = 0xff58b9ec
buffer[0] = ff
With struct return:
address of buffy = 0xff58b9ec
buffer[0] = ff
address of buffy = 0xff58b9ec
buffer[0] = ff
address of buffy = (nil)
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Struct CEC_call does not have any other fields.
It could be a buffer overflow somewhere, but there aren't any buffers at least none made by me. I have not used QEMU IIRC or valingrad before, but will look into them in more details. I can not test nateively at the moment as I do not have the access to the intended embedded linux.
struct CEC_call ceccall;
pepcall.request = request;
pepcall.obligation = obligation;
It seems you have mismatch in names of your variables: ceccall and pepcall, and you return an uninitialized variable ceccall.
My problem was that the header for the file that has got the struct CEC_call des_CEC_call(char * buffy) function declaration has not been included in the calling file.
Function called worked fine if it was returning standard types or void, but with custom struct return the array pointer passed in was nullified. This kind of baffled me initially as I didn’t think it would compile due to missing declaration and this segmentation fault only happened on arm32 architecture, I didn't get that crash on OSX.

how many recursive function calls causes stack overflow?

I am working on a simulation problem written in c, the main part of my program is a recursive function.
when the recursive depth reaches approximately 500000 it seems stack overflow occurs.
Q1 : I want to know that is this normal?
Q2 : in general how many recursive function calls causes stack overflow?
Q3 : in the code below, removing local variable neighbor can prevent from stack overflow?
my code:
/*
* recursive function to form Wolff Cluster(= WC)
*/
void grow_Wolff_cluster(lattic* l, Wolff* wolff, site *seed){
/*a neighbor of site seed*/
site* neighbor;
/*go through all neighbors of seed*/
for (int i = 0 ; i < neighbors ; ++i) {
neighbor = seed->neighbors[i];
/*add to WC according to the Wolff Algorithm*/
if(neighbor->spin == seed->spin && neighbor->WC == -1 && ((double)rand() / RAND_MAX) < add_probability)
{
wolff->Wolff_cluster[wolff->WC_pos] = neighbor;
wolff->WC_pos++; // the number of sites that is added to WC
neighbor->WC = 1; // for avoiding of multiple addition of site
neighbor->X = 0;
///controller_site_added_to_WC();
/*continue growing Wolff cluster(recursion)*/
grow_Wolff_cluster(l, wolff, neighbor);
}
}
}
I want to know that is this normal?
Yes. There's only so much stack size.
In the code below, removing local variable neighbor can prevent from stack overflow?
No. Even with no variables and no return values the function calls themselves must be stored in the stack so the stack can eventually be unwound.
For example...
void recurse() {
recurse();
}
int main (void)
{
recurse();
}
This still overflows the stack.
$ ./test
ASAN:DEADLYSIGNAL
=================================================================
==94371==ERROR: AddressSanitizer: stack-overflow on address 0x7ffee7f80ff8 (pc 0x00010747ff14 bp 0x7ffee7f81000 sp 0x7ffee7f81000 T0)
#0 0x10747ff13 in recurse (/Users/schwern/tmp/./test+0x100000f13)
SUMMARY: AddressSanitizer: stack-overflow (/Users/schwern/tmp/./test+0x100000f13) in recurse
==94371==ABORTING
Abort trap: 6
In general how many recursive function calls causes stack overflow?
That depends on your environment and function calls. Here on OS X 10.13 I'm limited to 8192K by default.
$ ulimit -s
8192
This simple example with clang -g can recurse 261976 times. With -O3 I can't get it to overflow, I suspect compiler optimizations have eliminated my simple recursion.
#include <stdio.h>
void recurse() {
puts("Recurse");
recurse();
}
int main (void)
{
recurse();
}
Add an integer argument and it's 261933 times.
#include <stdio.h>
void recurse(int cnt) {
printf("Recurse %d\n", cnt);
recurse(++cnt);
}
int main (void)
{
recurse(1);
}
Add a double argument, now it's 174622 times.
#include <stdio.h>
void recurse(int cnt, double foo) {
printf("Recurse %d %f\n", cnt, foo);
recurse(++cnt, foo);
}
int main (void)
{
recurse(1, 2.3);
}
Add some stack variables and it's 104773 times.
#include <stdio.h>
void recurse(int cnt, double foo) {
double this = 42.0;
double that = 41.0;
double other = 40.0;
double thing = 39.0;
printf("Recurse %d %f %f %f %f %f\n", cnt, foo, this, that, other, thing);
recurse(++cnt, foo);
}
int main (void)
{
recurse(1, 2.3);
}
And so on. But I can increase my stack size in this shell and get twice the calls.
$ ./test 2> /dev/null | wc -l
174622
$ ulimit -s 16384
$ ./test 2> /dev/null | wc -l
349385
I have a hard upper limit to how big I can make the stack of 65,532K or 64M.
$ ulimit -Hs
65532
A stack overflow isn’t defined by the C standard, but by the implementation. The C standard defines a language with unlimited stack space (among other resources) but does have a section about how implementations are allowed to impose limits.
Usually it’s the operating system that actually first creates the error. The OS doesn’t care about how many calls you make, but about the total size of the stack. The stack is composed of stack frames, one for each function call. Usually a stack frame consists of some combination of the following five things (as an approximation; details can vary a lot between systems):
The parameters to the function call (probably not actually here, in this case; they’re probably in registers, although this doesn’t actually buy anything with recursion).
The return address of the function call (in this case, the address of the ++i instruction in the for loop).
The base pointer where the previous stack frame starts
Local variables (at least those that don’t go in registers)
Any registers the caller wants to save when it makes a new function call, so the called function doesn’t overwrite them (some registers may be saved by the caller instead, but it doesn’t particularly matter for stack size analysis). This is why passing parameters in registers doesn’t help much in this case; they’ll end up on the stack sooner or later.
Because some of these (specifically, 1., 4., and 5.) can vary in size by a lot, it can be difficult to estimate how big an average stack frame is, although it’s easier in this case because of the recursion. Different systems also have different stack sizes; it currently looks like by default I can have 8 MiB for a stack, but an embedded system would probably have a lot less.
This also explains why removing a local variable gives you more available function calls; you reduced the size of each of the 500,000 stack frames.
If you want to increase the amount of stack space available, look into the setrlimit(2) function (on Linux like the OP; it may be different on other systems). First, though, you might want to try debugging and refactoring to make sure you need all that stack space.
Yes and no - if you come across a stack overflow in your code, it could mean a few things
Your algorithm is not implemented in a way that respects the amount of memory on the stack you have been given. You may adjust this amount to suit the needs of the algorithm.
If this is the case, it's more common to change the algorithm to more efficiently utilize the stack, rather than add more memory. Converting a recursive function to an iterative one, for example, saves a lot of precious memory.
It's a bug trying to eat all your RAM. You forgot a base case in the recursion or mistakenly called the same function. We've all done it at least 2 times.
It's not necessarily how many calls cause an overflow - it's dependent upon how much memory each individual call takes up on a stack frame. Each function call uses up stack memory until the call returns. Stack memory is statically allocated -- you can't change it at runtime (in a sane world). It's a last-in-first-out (LIFO) data structure behind the scenes.
It's not preventing it, it's just changing how many calls to grow_Wolff_cluster it takes to overflow the stack memory. On a 32-bit system, removing neighbor from the function costs a call to grow_Wolff_cluster 4 bytes less. It adds up quickly when you multiply that in the hundreds of thousands.
I suggest you learn more about how stacks work for you. Here's a good resource over on the software engineering stack exchange. And another here on stack overflow (zing!)
For each time a function recurs, your program takes more memory on the stack, the memory it takes for each function depends upon the function and variables within it. The number of recursions that can be done of a function is entirely dependant upon your system.
There is no general number of recursions that will cause stack overflow.
Removing the variable 'neighbour' will allow for the function to recur further as each recursion takes less memory, but it will still eventually cause stack overflow.
This is a simple c# function that will show you how many iteration your computer can take before stack overflow (as a reference, I have run up to 10478):
private void button3_Click(object sender, EventArgs e)
{
Int32 lngMax = 0;
StackIt(ref lngMax);
}
private void StackIt(ref Int32 plngMax, Int32 plngStack = 0)
{
if (plngStack > plngMax)
{
plngMax = plngStack;
Console.WriteLine(plngMax.ToString());
}
plngStack++;
StackIt(ref plngMax, plngStack);
}
in this simple case, the condition check: "if (plngStack > plngMax)" could be removed,
but if you got a real recursive function, this check will help you localize the problem.

Segmentation Fault: C-Program migrating from HPUX to Linux

I'm trying to migrate a small c program from hpux to linux. The project compiles fine but crashes at runtime showing me a segmentation fault. I've already tried to see behind the mirror using strace and gdb but still don't understand. The relevant (truncated) parts:
tts_send_2.c
Contains a method
int sequenznummernabgleich(int sockfd, char *snd_id, char *rec_id, int timeout_quit) {
TS_TEL_TAB tel_tab_S01;
int n;
# truncated
}
which is called from within that file like this:
. . .
. . .
switch(sequenznummernabgleich(sockfd,c_snd_id,c_rec_id,c_timeout_quit)) {
/* kritischer Fehler */
case -1:
. . .
. . .
when calling that method I'm presented a segmentation fault (gdb output):
Program received signal SIGSEGV, Segmentation fault.
0x0000000000403226 in sequenznummernabgleich (sockfd=<error reading variable: Cannot access memory at address 0x7fffff62f94c>,
snd_id=<error reading variable: Cannot access memory at address 0x7fffff62f940>, rec_id=<error reading variable: Cannot access memory at address 0x7fffff62f938>,
timeout_quit=<error reading variable: Cannot access memory at address 0x7fffff62f934>) at tts_snd_2.c:498
498 int sequenznummernabgleich(int sockfd, char *snd_id, char *rec_id, int timeout_quit) {
which I just don't understand. When I'm stepping to the line where the method is called using gdb, all the variables are looking fine:
1008 switch(sequenznummernabgleich(sockfd,c_snd_id,c_rec_id,c_timeout_quit)) {
(gdb) p sockfd
$9 = 8
(gdb) p &sockfd
$10 = (int *) 0x611024 <sockfd>
(gdb) p c_snd_id
$11 = "KR", '\000' <repeats 253 times>
(gdb) p &c_snd_id
$12 = (char (*)[256]) 0xfde220 <c_snd_id>
(gdb) p c_rec_id
$13 = "CO", '\000' <repeats 253 times>
(gdb) p &c_rec_id
$14 = (char (*)[256]) 0xfde560 <c_rec_id>
(gdb) p c_timeout_quit
$15 = 20
(gdb) p &c_timeout_quit
$16 = (int *) 0xfde660 <c_timeout_quit>
I've also created an strace output. Here's the last part concerning the code shown above:
strace output
Any ideas ? I've searched the web and of course stackoverflow for hours without finding a really similar case.
Thanks
Kriz
I haven't used an HP/UX in eons but do hazily remember enough for the following suggestions:
Make sure you're initializing variables / struts correctly. Use calloc instead of malloc.
Also don't assume a specific bit pattern order: eg low byte then high byte. Ska endian-ness of the machine. There are usually macros in the compiler that will handle the appropriate ordering for you.
Update 15.10.16
After debugging for even more hours I found the real Problem. On the first line of the Method "sequenznummernabgleich" is a declaration of a struct
TS_TEL_TAB tel_tab_S01;
This is defined as following:
typedef struct {
TS_BOF_REC bof;
TS_REM_REC rem;
TS_EOF_REC eof;
int bof_len;
int rem_len;
int eof_len;
int cnt;
char teltyp[LEN_TELTYP+1];
TS_TEL_ENTRY entries[MAX_TEL];
} TS_TEL_TAB;
and it's embedded struct TS_TEL_ENTRY
typedef struct {
int len;
char tel[MAX_TEL_LEN];
} TS_TEL_ENTRY;
The problem is that the value for MAX_TEL_LEN had been changed from 512 to 1024 and thus the struct almost doubled in size what lead to that the STACK SIZE was not big enough anymore.
SOLUTION
Simply set the stack size from 8Mb to 64Mb. This can be achieved using ulimit command (under linux).
List current stack size: ulimit -s
Set stack size to 64Mb: ulimit -s 65535
Note: Values for stack size are in kB.
For a good short ref on ulimit command have a look # ss64

Debugging segfault with no apparent cause in gdb?

gdb was reporting that my C code was crashing somewhere in malloc(), so I linked my code with Electric Fence to pinpoint the actual source of the memory error. Now my code is segfaulting much earlier, but gdb's output is even more confusing:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x30026b00 (LWP 4003)]
0x10007c30 in simulated_status (axis=1, F=0x300e7fa8, B=0x1003a520, A=0x3013b000, p=0x1003b258, XS=0x3013b000)
at ccp_gch.c:799
EDIT: The full backtrace:
(gdb) bt
#0 0x10007c30 in simulated_status (axis=1, F=0x300e7fa8, B=0x1003a520, A=0x3013b000, p=0x1003b258, XS=0x3013b000)
at ccp_gch.c:799
#1 0x10007df8 in execute_QUERY (F=0x300e7fa8, B=0x1003a520, iData=0x7fb615c0) at ccp_gch.c:836
#2 0x10009680 in execute_DATA_cmd (P=0x300e7fa8, B=0x7fb615cc, R_type=0x7fb615d0, iData=0x7fb615c0)
at ccp_gch.c:1581
#3 0x10015bd8 in do_volley (client=13) at session.c:76
#4 0x10015ef4 in do_dialogue (v=12, port=2007) at session.c:149
#5 0x10016350 in do_session (starting_port=2007, ports=1) at session.c:245
#6 0x100056e4 in main (argc=2, argv=0x7fb618f4) at main.c:271
The relevant code (slightly modified due to reasons):
796 static uint32_t simulated_status(
797 unsigned axis, struct foo *F, struct bar *B, struct Axis *A, BAZ *p, uint64_t *XS)
798 {
799 uint32_t result = A->status;
800 *XS = get_status(axis);
801 if (!some_function(p)) {
802 ...
The obvious thing to check would be whether A->status is valid memory, but it is. Removing the assignment pushes the segfault to line 800, and removing that assignment causes some other assignment in the if-block to segfault. It looks as though either accessing an argument passed to the function or writing to a local variable is what's causing the segfault, but everything points to valid memory according to gdb.
How am I to interpret this? I've never seen anything like this before, so any suggestions / pointers in the right direction would be appreciated. I'm using GNU gdb 6.8-debian, Electric Fence 2.1, and running on a PowerPC 405 (uname reports Linux powerpmac 2.6.30.3 #24 [...] ppc GNU/Linux).
I'm guessing, but your symptoms are similar to what could happen in a stack overflow situation. The -fstack-protector suggestion in the comments is on the right track here. I'd recommend adding the -fstack-check option as well.
If the SEGV is occurring because of writes to the guard page protecting the stack then an info registers and info frame in gdb would help confirm if this is the case.

Getting Segmentation Fault in C program

I am getting segmentation fault on line 8 in the code below.
typedef struct _my_struct {
int pArr[21];
int arr1[8191];
int arr2[8191];
int m;
int cLen;
int gArr[53];
int dArr[8191];
int data[4096];
int rArr[53];
int eArr[1024];
};
void *populate_data(void *arg) {
1 register int mask =1, iG;
2 struct _my_struct *var ;
3 var = arg; // arg is passed as initialized struct variable while creating thread
4 var->m = 13;
5 var->arr2[var->m] = 0;
6 for (iG = 0; iG < var->m; iG++) {
7 var->arr2[iG] = mask;
8 var->arr1[var->arr2[iG]] = iG;
9 if (var->pArr[iG] != 0) // pArr[]= 1011000000001
10 var->arr2[var->m] ^= mask;
11 mask <<= 1;
12 }
13 var->arr1[var->arr2[var->m]] = var->m;
14 mask >>= 1;
15 for (iG = var->m+ 1; iG < var->cLen; iG++) {
16 if (var->arr2[iG - 1] >= mask)
17 var->arr2[iG] = var->arr2[var->m] ^ ((var->arr2[iG- 1] ^ mask) << 1);
18 else
19 var->arr2[iG] = var->arr2[iG- 1] << 1;
20 var->arr1[var->arr2[iG]] = iG;
21 }
22 var->arr1[0] = -1;
}
Here is the thread function:
void main() {
unsigned int tid;
struct _my_struct *instance = NULL;
instance = (struct _my_struct *)malloc(sizeof(_my_struct ));
start_thread(&tid , 119312, populate_data, instance );
}
int
start_thread(unsigned int *tid, int stack_size, void * (*my_function)(void *), void *arg)
{
pthread_t ptid = -1;
pthread_attr_t pattrib;
pthread_attr_init(&pattrib);
if(stack_size > 0)
{
pthread_attr_setstacksize(&pattrib, stack_size);
}
else
{
pthread_attr_destroy(&pattrib);
return -1;
}
pthread_create(&ptid, &pattrib, my_function, arg);
pthread_attr_destroy(&pattrib);
return 0;
}
Once I debug it through gdb, got this error,
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffdfec80700 (LWP 22985)]
0x0000000000401034 in populate_data (arg=0x7fffffffe5d8) at Queue.c:19
19 var->arr1[var->arr2[iG]] = iG;
and its backtrace is:
#0 0x0000000000401034 in populate_data (arg=0x7fffffffe5d8) at Queue.c:159
#1 0x00007ffff7bc6971 in start_thread () from /lib/libpthread.so.0
#2 0x00007ffff792292d in clone () from /lib/libc.so.6
#3 0x0000000000000000 in ?? ()
However, I'm unable to correct the error.
Anyhelp is really appreciated.
Please show the calling code in start_thread.
It seems likely to be a stack and/or memory allocation error, the structure is pretty large (8 MB assuming 32-bit ints) and might well overflow some stack limit.
Even more possible is that it's gone out of scope, which is why the calling step must be shown.
I don't know if perhaps you've changed the names of the arrays in your _my_struct in order to hide the purpose of them (company confidential information, perhaps?), but if that's actually what you've named your arrays, I'm just going to suggest that you name them something that makes sense to you that when someone has to read your code 4 years from now, they'll have some hope of following your initialization loops & understanding what's going on. Same goes for your loop variable iG.
My next comment/question is, why are you firing off a thread to initialize this structure that's on the stack of the main thread? Which thread is going to be using this structure once it's initialized? Or are you going to make other threads that will use it? Do you have any mechanism (mutex? semaphore?) to ensure that the other threads won't start using the data until your initialization thread is done initializing it? Which sort of begs the question, why the heck are you bothering to fire off a separate thread to initialize it in the first place; you could just initialize it by calling populate_data() straight from main() and not even have to worry about synchronization because you wouldn't even be starting up any other threads until after it's done being initialized. If you're running on a multicore machine, you might get some small benefit from firing off that separate thread to do the initialization while main() goes on & does other stuff, but from the size of your struct (not tiny, but not huge either) it seems like that benefit would be very miniscule. And if you're running on a single core, you'll get no concurrency benefit at all; you'd just be wasting time firing off another thread to do it due to the context switching overhead; in a unicore environment you'd be better off just calling populate_data() directly from main().
Next comment is, your _my_struct is not huge, so it's not going to blow your stack by itself. But it ain't tiny either. If your app will always need only one copy of this struct, maybe you should make it a global variable or a file-scope variable, so it doesn't eat up stack space.
Finally, to your actual bug............
I didn't bother to try to decipher your cryptic looping code, but valgrind is telling me that you have some conditions that depend on uninitialized locations:
~/test/so$ valgrind a.out
==27663== Memcheck, a memory error detector
==27663== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==27663== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==27663== Command: a.out
==27663==
==27663== Thread 2:
==27663== Conditional jump or move depends on uninitialised value(s)
==27663== at 0x8048577: populate_data (so2.c:34)
==27663== by 0x593851: start_thread (in /lib/libpthread-2.5.so)
==27663== by 0x4BDA8D: clone (in /lib/libc-2.5.so)
==27663==
==27663== Conditional jump or move depends on uninitialised value(s)
==27663== at 0x804868A: populate_data (so2.c:40)
==27663== by 0x593851: start_thread (in /lib/libpthread-2.5.so)
==27663== by 0x4BDA8D: clone (in /lib/libc-2.5.so)
My so2.c line 34 corresponds with line 9 in your code posting above.
My so2.c line 40 corresponds with line 15 in your code posting above.
If I add the following at the top of populate_data(), these valgrind errors disappear:
memset(arg,0,sizeof(_my_struct_t));
(I modified your struct definition as follows:)
typedef struct _my_struct { int pArr[21]; ......... } _my_struct_t;
Now just because adding the memset() call makes the errors disappear doesn't necessarily mean that your loop logic is correct, it just means that now those locations are considered "initialized" by valgrind. If having all-zeros in those locations when your initialization loops begin is what your logic needs, then that should fix it. But you need to verify for yourself that such really is the proper solution.
BTW... someone suggested using calloc() to get a zeroed-out allocation (rather than using dirty stack space)... that would work too, but if you want populate_data() to be foolproof, you'll zero the memory in it and not in the caller, since (assuming you like your initialization logic as it is), populate_data() is the thing that depends on it being zeroed out, main() shouldn't have to care whether it is or not. Not a biggie either way.

Resources