For a project written in C, I am creating a number of semaphore with sem_open, some are binary other are counting, I don't think it matters. The semaphores are stored in a static structure (singleton) then I fork the process multiple time, and afterwards exit all my forks. The main process returns however only after I used sem_close() followed by sem_unlink() on all my semaphores.
void init_semaphore(void)
{
ru->stop = sem_open("/stop", O_CREAT, 0644, 1);
...
}
void sem_close_all(void)
{
sem_close(ru->stop);
...
sem_unlink("/stop");
}
When I run valgrind I get the following errors:
==744644== 39 bytes in 1 blocks are still reachable in loss record 10 of 13
==744644== at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==744644== by 0x49107A8: __sem_check_add_mapping (sem_routines.c:104)
==744644== by 0x49104BB: sem_open##GLIBC_2.34 (sem_open.c:192)
==744644== by 0x10A34F: init_semaphore
==744644== by 0x10A0A9: init_all
==744644== by 0x10A05E: main
So I don't understand how to avoid those leaks that (I think) happens in my children ?... Note: I can't use sem_destroy because of project restriction.
Am I doing something wrong ? I tried to use free() on sem_t *stop, but it didn't get rid of the leaks. Is there a solution I could use to fix this, is it a valgrind false positive ?... I tried to use my sem_close_all() function at the end of all my process, but this does not fix the valgrind leak report.
I would not worry too much about it. Valgrind suppresses many other known memory leaks.
The error-checking tools detect numerous problems in the system
libraries, such as the C library, which come preinstalled with your
OS. You can't easily fix these, but you don't want to see these errors
(and yes, there are many!) So Valgrind reads a list of errors to
suppress at startup. A default suppression file is created by the
./configure script when the system is built.
You can manually add it permanently yourself
tried to use free() on sem_t *stop
Do not do this.
Related
I'm using the pthread.h library in glibc-2.27 and when my process calls pthread_create() eighteen times or more (it's supposed to be a heavy multi-threaded application) the process is aborted with the error message:
*** stack smashing detected ***: <unknown> terminated
Aborted (core dumped)
I did some strace as part of my debugging ritual and I found the reason. Apparently all implicit calls for mmap() as part of the pthread_create() looks like this:
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f6de43fa000
One can notice the MAP_STACK flag which indicates:
Allocate the mapping at an address suitable for a process or thread stack.
This flag is currently a no-op, but is used in the glibc threading implementation so that if some architectures require special treatment for stack allocations, support can later be transparently implemented for glibc.
(man mmap on my system - Ubuntu 18.04 LTS)
It is possible to configure the pthread_create call not to do this? or maybe use brk or something else to increase the data segment automatically?
Thanks for any help!
It is extremely unlikely that your issue has anything to do with this MAP_STACK flag.
You have a bug somewhere else in your application which causes stack corruption. Try running your application under valgrind, or building with -fsanitize=address. Either approach may pinpoint the exact location of the error, and you should be able to figure out what is wrong based on that.
It is possible to configure the pthread_create call not to do this?
pthread_create() needs to allocate space for the thread's stack, otherwise the thread cannot run -- not even with an empty thread function. That's what the mmap you're seeing is for. It is not possible to do without.
or maybe use brk or something else to increase the data segment automatically?
If you have the time and skill to write your own thread library, then do have a go and let us know what happens. Otherwise, no, the details of how pthread_create() reserves space for the new thread's stack is not configurable in any implementation I know.
And that does not matter anyway, because the mmap() call is not the problem. If a syscall has an unrecoverable failure then that's a failure of the kernel, and you get a kernel panic, not an application crash. GNU C's stack-smashing detection happens in userspace. The functions to which it applies therefore do not appear in your strace output, which traces only system calls.
It might be useful to you have a better understanding of stack smashing and GNU's defense against it. Dr. Dobb's ran a nice article on just that several years ago, and it is still well worth a read. The bottom line, though, is that stack smashing happens when a function implementation misbehaves by overwriting the part of its stack frame that contains its return address. Unless you've got some inline assembly going on, the smashing almost surely arises from one of your own functions overrunning the bounds of one of its local variables. It is detected when that function tries to return, by tooling in the function epilogue that serves that purpose.
Whatever program I run Valgrind tells me that there are 72 possibly lost bytes in 3 blocks, even with a simple program like:
int main(void)
{
printf("Hello, World!\n");
return 0;
}
Do you know if this is a Valgrind bug on Mac OS sierra?
How could I leak memory with a program like this?
That can very likely happen, if any of the preloaded libraries (e.g. via LD_PRELOAD), or any parts of the linked C runtime have memory leaks.
There are also a couple of memory allocations performed by the CRT which are never freed on purpose, but typically these are only a one-time thing and only happen once per process.
Valgrind can not reliably distinguish between what's part of your application at what isn't. You can only check the stack trace from where the memory was allocated, and decide whether that is your domain or not.
Assume I have a C program (running under Linux) which manipulates many data structures, some complex, several of which can grow and shrink but should not in general grow over time. The program is observed to have a gradually increasing RSS over time (more so than can be explained by memory fragmentation). I want to find what is leaking. Running under valgrind is the obvious suggestion here, but valgrind (with --leak-check=full and --show-reachables=yes) shows no leak. I believe this to be because the data structures themselves are correctly being freed on exit, but one of them is growing during the life of the program. For instance, there might be a linked list which is growing linearly over time, with someone forgetting to remove the resource on the list, but the exit cleanup correctly freeing all the items on the list at exit. There is a philosophical question as to whether these are in fact 'leaks' if they are freed, of course (hence the quote marks in the question).
Are there any useful tools to instrument this? What I'd love is the ability to run under valgrind and have it produce a report of current allocations just as it does on exit, but to have this happen on a signal and allow the program to continue. I could then look for what stack trace signatures had growing allocations against them.
I can reliably get a nice large 'core' file from gdb with generate-core-file; is there some way to analyse that off-line, if say I compiled with a handy malloc() debugging library that instrumented malloc()?
I have full access to the source, and can modify it, but I don't really want to instrument every data structure manually, and moreover I'm interested in a general solution to the problem (like valgrind provides) rather than how to address this particular issue.
I've looked for similar questions on here but they appear all to be:
Why does my program leak memory?
How do I detect memory leaks at exit? (no use for me)
How do I detect memory leaks from a core file? (great, but none has a satisfactory answer)
If I was running under Solaris I'm guessing the answer would be 'use this handy dtrace script'.
Valgrind includes a gdbserver. This basically means you can use gdb to connect to it, and e.g. issue a leak dump, or to show all reachable memory while running. Ofcourse, you have to judge whether there is a "memory leak" or not, as valgrind can't know if there's a bug in the application logic that fails to release memory, but still keep references to it.
Run valgrind with the --vgdb=yes flag and then run the commands:
valgrind --vgdb=yes --leak-check=full --show-reachable=yes ./yourprogram
gdb ./yourprogram
(gdb) target remote | vgdb
(gdb) monitor leak_check full reachable any
See the docs for more info, here and here
You can also do this programatically in your program
#include <valgrind/memcheck.h>
and at an appropriate place in the code do:
VALGRIND_DO_LEAK_CHECK;
(iirc that'll show reachable memory too, as long as valgrind is run with --show-reachable=yes
There's the Valgrind Massif tool which shows general memory usage of your application, not just for leaked memory. It breaks down malloc()s and free()s by calling functions and their backtraces, so you can see which functions keep allocating memory without releasing it. This can be an excellent tool for finding leaks of the type you mentioned.
Unfortunately the tooling around Massif is a bit weird... The ms_print tool provided with Valgrind is only useful for the most basic tasks; for real work you probably want something that displays a graph. There are several tools for this strewn around the net - see eg. Valgrind Massif tool output graphical interface? .
I have a strange problem freeing allocated memory in my mpi program:
Here is a code sample that produces the error for me:
void *out, *in;
int cnt = 2501; //if cnt<=2500: works perfectly. cnt>2500: crashes at free!
if((out = malloc(cnt * sizeof(double))) == NULL)
MPI_Abort(MPI_COMM_WORLD, MPI_ERR_OP);
if((in = malloc(cnt * sizeof(double))) == NULL)
MPI_Abort(MPI_COMM_WORLD, MPI_ERR_OP);
//Test data generation
//usage of MPI_Send and MPI_Reduce_local
//doing a lot of memcpy, assigning pointer synonyms to in and out, changing data to in and out
free(in); //crashes here with "munmap_chunk(): invalid pointer"
free(out); //and here (if above line is commented out) with "double free or corruption (!prev)"
I ran it using valgrind:
mpirun -np 2 valgrind --leak-check=full --show-reachable=yes ./foo
and got the following:
==6248== Warning: ignored attempt to set SIGRT32 handler in sigaction();
==6248== the SIGRT32 signal is used internally by Valgrind
cr_libinit.c:183 cri_init: sigaction() failed: Invalid argument
==6248== HEAP SUMMARY:
==6248== in use at exit: 0 bytes in 0 blocks
==6248== total heap usage: 1 allocs, 1 frees, 25 bytes allocated
==6248==
==6248== All heap blocks were freed -- no leaks are possible
==6248==
=====================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 134
Any ideas about how to track this error down? Note that it only appears if cnt>2500!
If you’re using GNU glibc, you can set the environment variable MALLOC_CHECK_ to 2 before running your program, to enable extra checking on memory-allocation calls—details here.
The message you had above,
Warning: ignored attempt to set SIGRT32 handler in sigaction(); the
SIGRT32 signal is used internally by Valgrind cr_libinit.c:XXX
cri_init: sigaction() failed: Invalid argument
has to do with the MPI's (mine is mpich-3.1.4-9) use of the BLCR checkpointing libraries (mine is blcr-0.8.5)
When I had no support for BLCR (run "mpiexec -info" and look at the "Checkpointing libraries available" line) Valgrind was working perfectly during my test phase.
When I recompiled my MPI for BLCR support (for Checkpointing experiments) Valgrind got heart-attack. He stopped working totally.
The bug (as the programmers say,) is very bad, because obviously the two programs use the same signalling for their interrupts to your running program, and simply they cannot do so. (In our case the BLCR for MPI got it first and now Valgrind is left walking on the air).
I will try to run two different installations of MPI to the same machine (one with blcr support and one without), and I hope I will alternate happily between Valgrind and Checkpoint.
UPDATE:
Even if Checkpoint itself was working, it was not possible to run any mpiexec parallel executables (programs that they were working before).
Mpiexec command itself, was crashing when I was USING (the compiled in) checkpointing library BLCR.
SOLUTION:
I recompiled MPI (mpich-3.1.4-9) WITHOUT BLCR support (I put BLCR out completely) and I installed the DMTCP checkpointing solution (dmtcp-2.4.4) that not only works transparently, but it is also faster than BLCR (you will find the benchmarks in the bibliography).
Now everything runs as intended :) and the handling of the checkpointed jobs is done properly too. In the future I will do more heavy tests of DMTCP (I will use local files that will have heavy/active IO from the parallel program).
PS. I also found out that mpich has put out completely the BLCR from their distribution (July 2016).
On running valgrind --leak_check=yes with my executable file I get the following errors.
==17325== 136 bytes in 1 blocks are possibly lost in loss record 17 of 21
==17325== at 0x4004C42: calloc (vg_replace_malloc.c:418)
==17325== by 0xCC5CA9: _dl_allocate_tls (in /lib/ld-2.5.so)
==17325== by 0xD0BF5C: pthread_create##GLIBC_2.1 (in /lib/libpthread-2.5.so)
==17325== by 0x8049334: init (prog.c:238)
==17325== by 0x804C94F: main (prog.c:163)
It is pointing to my pthread_create call. I called pthread_detach after creating the thread. I don't want to call pthread_join. I searched about this and found many people faced the same issue but I couldn't find the exact reason for this. Is this because of the behavior of pthread library? Can someone please provide good links which talk about this problem.
Calling pthread_join on a detached thread is illegal. Don't detach the thread if you want to be able to join it.