On running valgrind --leak_check=yes with my executable file I get the following errors.
==17325== 136 bytes in 1 blocks are possibly lost in loss record 17 of 21
==17325== at 0x4004C42: calloc (vg_replace_malloc.c:418)
==17325== by 0xCC5CA9: _dl_allocate_tls (in /lib/ld-2.5.so)
==17325== by 0xD0BF5C: pthread_create##GLIBC_2.1 (in /lib/libpthread-2.5.so)
==17325== by 0x8049334: init (prog.c:238)
==17325== by 0x804C94F: main (prog.c:163)
It is pointing to my pthread_create call. I called pthread_detach after creating the thread. I don't want to call pthread_join. I searched about this and found many people faced the same issue but I couldn't find the exact reason for this. Is this because of the behavior of pthread library? Can someone please provide good links which talk about this problem.
Calling pthread_join on a detached thread is illegal. Don't detach the thread if you want to be able to join it.
Related
For a project written in C, I am creating a number of semaphore with sem_open, some are binary other are counting, I don't think it matters. The semaphores are stored in a static structure (singleton) then I fork the process multiple time, and afterwards exit all my forks. The main process returns however only after I used sem_close() followed by sem_unlink() on all my semaphores.
void init_semaphore(void)
{
ru->stop = sem_open("/stop", O_CREAT, 0644, 1);
...
}
void sem_close_all(void)
{
sem_close(ru->stop);
...
sem_unlink("/stop");
}
When I run valgrind I get the following errors:
==744644== 39 bytes in 1 blocks are still reachable in loss record 10 of 13
==744644== at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==744644== by 0x49107A8: __sem_check_add_mapping (sem_routines.c:104)
==744644== by 0x49104BB: sem_open##GLIBC_2.34 (sem_open.c:192)
==744644== by 0x10A34F: init_semaphore
==744644== by 0x10A0A9: init_all
==744644== by 0x10A05E: main
So I don't understand how to avoid those leaks that (I think) happens in my children ?... Note: I can't use sem_destroy because of project restriction.
Am I doing something wrong ? I tried to use free() on sem_t *stop, but it didn't get rid of the leaks. Is there a solution I could use to fix this, is it a valgrind false positive ?... I tried to use my sem_close_all() function at the end of all my process, but this does not fix the valgrind leak report.
I would not worry too much about it. Valgrind suppresses many other known memory leaks.
The error-checking tools detect numerous problems in the system
libraries, such as the C library, which come preinstalled with your
OS. You can't easily fix these, but you don't want to see these errors
(and yes, there are many!) So Valgrind reads a list of errors to
suppress at startup. A default suppression file is created by the
./configure script when the system is built.
You can manually add it permanently yourself
tried to use free() on sem_t *stop
Do not do this.
I'm having trouble identifying why valgrind is throwing this error:
==82185== Thread 2:
==82185== Use of uninitialised value of size 8
==82185== at 0x401B9A: proc_outconnection_thread (station.c:401)
==82185== by 0x4E3CDF4: start_thread (in /usr/lib64/libpthread-2.17.so)
==82185== by 0x51471AC: clone (in /usr/lib64/libc-2.17.so)
==82185==
the pass im sending is 'this'
==82185== Use of uninitialised value of size 8
==82185== at 0x401BCA: proc_outconnection_thread (station.c:403)
==82185== by 0x4E3CDF4: start_thread (in /usr/lib64/libpthread-2.17.so)
==82185== by 0x51471AC: clone (in /usr/lib64/libc-2.17.so)
==82185==
As a bit of background information, the program i'm trying to create in C simulates a train station that uses TCP connections as "trains". I'm trying to get the program to use threads in order to both listen for and try and connect to other stations (other instances of the program).
The problem seems to exist when passing an internal data struct to a thread creation function via an argument struct that contains a pointer to the internal data struct. This way each thread has a pointer to the programs internal data.
In my efforts of testing, the file is compiled with
gcc -pthread -g -o station station.c -Wall -pedantic -std=gnu99
To produce my error, begin an instance of station with valgrind ./station tom authf logfile 3329 127.0.1.1
and then begin another instance with valgrind ./station tim authf logfile 3328 127.0.1.1
Due to an if statement in main, the station name with tim will attempt to connect to tom, and tom will create a socket and listen for tims attempt to connect. The connection seems to be successful however for some reason I'm also unable to flush the connection to send anything between, which i have a feeling may be because of what Valgrind is telling me.
What's so strange is that when a thread is created for the connection on tom's instance, no errors in valgrind are thrown despite having a very similar procedure for creating the thread (the same arguments are passed through the argument pointer and the same assignments are made).
Could it be a false positive for tim's end, or am I doing something severely wrong here?
Your problem is passing pointer on local variable into thread function. Simplest workaround is declare this variable as static or global, but this is not good if there are several threads use that variable.
Its better to allocate needed memory size for structure, initialize and pass this into thread function:
ArgStruct *argStruct = malloc(sizeof(ArgStruct));
if(argStruct == NULL) {
fprintf(stderr, "Cant alloc memory!\n");
exit(98);
}
argStruct->internalStruct = internal;
argStruct->clientCon = fdopen(fd, "r+");
pthread_create(&threadId, NULL, proc_outconnection_thread, (void *)argStruct);
Also, don't forget to free this memory (at the end of proc_outconnection_thread() for example).
Track the value of your internal data structure back to where it comes from and you will see that it originates from a struct object that is not initialized. You later assign values to some of the fields, but not to all.
Always initialize struct objects, and at the same time watch that you have a convention that makes it clear what default initialization (as if done with 0) means for the type.
If, one day, you really have a performance bottleneck because your compiler doesn't optimize an unused initialization, think of it again and do it differently. Here, because you are launching threads and do other complicated stuff, the difference will never be measurable.
I have a strange problem freeing allocated memory in my mpi program:
Here is a code sample that produces the error for me:
void *out, *in;
int cnt = 2501; //if cnt<=2500: works perfectly. cnt>2500: crashes at free!
if((out = malloc(cnt * sizeof(double))) == NULL)
MPI_Abort(MPI_COMM_WORLD, MPI_ERR_OP);
if((in = malloc(cnt * sizeof(double))) == NULL)
MPI_Abort(MPI_COMM_WORLD, MPI_ERR_OP);
//Test data generation
//usage of MPI_Send and MPI_Reduce_local
//doing a lot of memcpy, assigning pointer synonyms to in and out, changing data to in and out
free(in); //crashes here with "munmap_chunk(): invalid pointer"
free(out); //and here (if above line is commented out) with "double free or corruption (!prev)"
I ran it using valgrind:
mpirun -np 2 valgrind --leak-check=full --show-reachable=yes ./foo
and got the following:
==6248== Warning: ignored attempt to set SIGRT32 handler in sigaction();
==6248== the SIGRT32 signal is used internally by Valgrind
cr_libinit.c:183 cri_init: sigaction() failed: Invalid argument
==6248== HEAP SUMMARY:
==6248== in use at exit: 0 bytes in 0 blocks
==6248== total heap usage: 1 allocs, 1 frees, 25 bytes allocated
==6248==
==6248== All heap blocks were freed -- no leaks are possible
==6248==
=====================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 134
Any ideas about how to track this error down? Note that it only appears if cnt>2500!
If you’re using GNU glibc, you can set the environment variable MALLOC_CHECK_ to 2 before running your program, to enable extra checking on memory-allocation calls—details here.
The message you had above,
Warning: ignored attempt to set SIGRT32 handler in sigaction(); the
SIGRT32 signal is used internally by Valgrind cr_libinit.c:XXX
cri_init: sigaction() failed: Invalid argument
has to do with the MPI's (mine is mpich-3.1.4-9) use of the BLCR checkpointing libraries (mine is blcr-0.8.5)
When I had no support for BLCR (run "mpiexec -info" and look at the "Checkpointing libraries available" line) Valgrind was working perfectly during my test phase.
When I recompiled my MPI for BLCR support (for Checkpointing experiments) Valgrind got heart-attack. He stopped working totally.
The bug (as the programmers say,) is very bad, because obviously the two programs use the same signalling for their interrupts to your running program, and simply they cannot do so. (In our case the BLCR for MPI got it first and now Valgrind is left walking on the air).
I will try to run two different installations of MPI to the same machine (one with blcr support and one without), and I hope I will alternate happily between Valgrind and Checkpoint.
UPDATE:
Even if Checkpoint itself was working, it was not possible to run any mpiexec parallel executables (programs that they were working before).
Mpiexec command itself, was crashing when I was USING (the compiled in) checkpointing library BLCR.
SOLUTION:
I recompiled MPI (mpich-3.1.4-9) WITHOUT BLCR support (I put BLCR out completely) and I installed the DMTCP checkpointing solution (dmtcp-2.4.4) that not only works transparently, but it is also faster than BLCR (you will find the benchmarks in the bibliography).
Now everything runs as intended :) and the handling of the checkpointed jobs is done properly too. In the future I will do more heavy tests of DMTCP (I will use local files that will have heavy/active IO from the parallel program).
PS. I also found out that mpich has put out completely the BLCR from their distribution (July 2016).
I am writing a stock market system that uses several threads to process the incoming orders.
The project was going fine until i added one more thread. When i launch the said thread my program segfaults. The segfault is generated in the above thread by an invalid memory read.
This segfault is generated only when the program is compiled with optimization -O2 and above.
After compiling the programming with debug info using -g3 and running valgrind using
valgrind ./marketSim
and get the following output about the segfault
==2524== Thread 5:
==2524== Invalid read of size 4
==2524== at 0x402914: limitWorker (limit.c:4)
==2524== by 0x4E33D5F: start_thread (in /lib/libpthread-2.14.so)
==2524== Address 0x1c is not stack'd, malloc'd or (recently) free'd
==2524==
==2524==
==2524== Process terminating with default action of signal 11 (SIGSEGV)
==2524== Access not within mapped region at address 0x1C
==2524== at 0x402914: limitWorker (limit.c:4)
==2524== by 0x4E33D5F: start_thread (in /lib/libpthread-2.14.so)
The thread is launched like this
pthread_t limit_thread;
pthread_create(&limit_thread, NULL, limitWorker, q);
q is variable which is also passed to other threads i initialize
the limitWorker code is as follows
void *limitWorker(void *arg){
while(1){
if ((!lsl->empty) && (!lbl->empty)) {
if ((currentPriceX10 > lGetHead(lsl)->price1) && (currentPriceX10 < lGetHead(lbl)->price1)) {
llPairDelete(lsl,lbl);
}
}
}
return NULL;
}
Line 4: The line which according to valgrind produces the segfault is void *limitWorker(void *arg){
Also some more info this is compiled using gcc 4.6.1, when using gcc 4.1.2 the program doesn't segfault, even when it is optimized although it's performance is much worse.
When the program is complied using clang it also doesn't segfault when optimized.
Question
Am i making a mistake?? Is it a gcc bug?? What course of action should i follow??
If you want to take a look at the code the github page is https://github.com/spapageo/Stock-Market-Real-Time-System/
The code in question is in file marketSim.c and limit.c
EDIT: Valgrind specifies that the invalid read happens at line 4. Line 4 is the "head" of the function. I don't know compiler internals, so my naive thought is that the argument is wrong. BUT while using gdb after the segfault the argument , because the program is optimized, is optimized out according to gdb. So i don't think that that is the culprit.
If you are compiling for a 64 bit system, then 0x1c is the offset of the price1 field within the order struct. This implies that either (or both) of lsl->HEAD and lbl->HEAD are NULL pointers when the fault occurs.
Note that because your limitWorker() function includes no thread synchronisation outside of the llPairDelete() function, it is incorrect and the compiler may not be reloading those values on every execution of the loop. You should be using a using a mutex to protect the linked lists even in the read-only paths.
Additionally, your lsl and lbl variables are multiply defined. You should declare them as extern in limit.h, and define them without the extern in limit.c.
I have created 20 threads to read/write a shared file.I have synchronized threads.
Now My program works fine but when I run it with valgrind it gives me Errors like this:
LEAK SUMMARY:
**definitely lost: 0 bytes in 0 blocks.
\
**possibly lost: 624 bytes in 5 blocks.**
**still reachable: 1,424 bytes in 5 blocks.****
suppressed: 0 bytes in 0 blocks.
Reachable blocks (those to which a pointer was found) are not shown.
Also When I press Ctrl + c , it gives the same errors.
I have not even malloced anything but still valgrind complains.
Any suggestion would be appreciated .
You can run valgrind --leak-check=full ./prog_name to make sure these reachable blocks are not something you can destroy in your program. Many times initializing a library such as libcurl without closing or destroying it will cause leaks. If it's not something you have control over, you can write a suppression file. http://valgrind.org/docs/manual/mc-manual.html section 4.4 has some info and a link to some examples
Sill reachable blocks are probably caused by your standard library not freeing memory used in pools for standard containers (see this faq): which would be a performance optimisation for program exit, since the memory is immediately going to be returned to the operating system anyway.
"Possibly lost" blocks are probably caused by the same thing.
The Valgrind Manual page for memcheck has a good explanation about the different sorts of leaks detected.