I'm running into a situation where threads that I create detachable are not releasing their memory after they have exited.
I have tried creating the threads in the following ways
1-
pthread_attr_setdetachstate(&pthread_attributes, PTHREAD_CREATE_DETACHED);
pthread_create(&thread_id, &pthread_attributes, establish_connection,
(void *) establish_connection_arguments);
2-
pthread_create(&thread_id, &pthread_attributes, establish_connection,
(void *) establish_connection_arguments);
pthread_detach(thread_id);
3-
pthread_create(&thread_id, &pthread_attributes, establish_connection,
(void *) establish_connection_arguments);
void *establish_connection(void *arguments) {
pthread_detach(pthread_self());
return NULL;
}
I'm sure the memory is still retained, as pmap confirms this.
Is it normal behavior, that pmap will still show the threads with their memory after the threads have completed?
By default, glibc/nptl caches thread stacks to reuse them. This incurs a small cost of synchronization to add/remove elements from the cache list and a nontrivial (but hopefully not huge) memory cost, but avoids the cost of calling mmap and munmap every time a thread is created or destroyed. I don't suspect there's any way to change this default behavior without extremely fragile hacks.
Edit: Since you said that joinable threads are being released, here's my second guess at the reason: It's very difficult for the implementation to make a thread release its own stack, since it would have no stack to run on while performing the work to release its stack. It's possible to work around this limitation by writing asm that does not need a stack to perorm the munmap syscall immediately followed by self-termination, but I've never seen an implementation do it.
Most likely, the space is cached by whatever memory allocator is in use. Running the odd thread and not having the memory use go down afterwards is not a sign of a leak - you'd have to repeatedly open and close a lot of threads, then check process memory use.
The pmap program is showing you the process' memory map. That memory is still used by the process. For example, if another thread is created, that memory might be used for its stack.
Related
Now I am actively studying the code of memory managers jemalloc and tcmalloc. But I can't understand how these two managers track threads.
If I understand correctly, a new thread can be detected during memory allocation, after which a new thread cache is created. But how does tcmalloc / jemalloc detect when a thread is destroyed and the thread cache attached to it can be freed for a future use?
Google results could not give even a minimum of any useful information.
I can only answer for jemalloc, but the way it works is that when the thread cache is created it is associated with the thread specific data for that thread.
When you create thread specific data, you can give it a 'destructor', which is invoked when the thread is being destroyed. If you're using pthreads it's the pthread_key_create routine, which is the C way of creating thread specific data.
In the case of jemalloc, there is a bit of code in tcache.h, which hooks tcache_thread_cleanup with the tcache data (my source jemalloc-3.0.0):
143 malloc_tsd_funcs(JEMALLOC_INLINE, tcache, tcache_t *, NULL,
144 tcache_thread_cleanup)
So when the thread is exited, the destructor gets called. It gets given the pointer to the cache for that thread and runs the tcache_thread_cleanup routine at that time.
Is a thread dynamically allocated memory?
I have been researching and have a fair understanding of threads and how they are used. I have specifically looked at the POSIX API for threads.
I am trying to understand thread creation and how it differs from a simple malloc call.
I understand that threads share certain memory segments with the parent process, but it has it's own stack.
Any resources I can read through on this topic is appreciated. Thanks!
Thread creation and a malloc() call are completely different concepts. A malloc() call dynamically allocates the requested byte chunk of memory from the heap for the use of the program.
Whereas a thread can be considered as a 'light-weight process'. The thread is an entity within a process and every process will have atleast one thread to help complete its execution. The threads of a process will share the process virtual address and all the resources of the process. When you create new threads of a process, these new threads will have their own user stack, they will be scheduled independently to be executed by the scheduler. Also for the thread to run concurrently they will have their context which will store the state of the thread just before preemption i.e the status of all the registers.
Is a thread dynamically allocated memory?
No, nothing of the sort. Threads have memory uniquely associated with them -- at least a stack -- but such memory is not the thread itself.
I am trying to understand thread creation and how it differs from a simple malloc call.
New thread creation is not even the same kind of thing as memory allocation. The two are not at all comparable.
Threading implementations that have direct OS support (not all do) are unlikely to rely on the C library to obtain memory for their stack, kernel data structures, or any other thread-implementation-associated data. On the other hand, those that do not have OS support, such as Linux's old "green" threads, are more likely to allocate memory via the C library. Even threading implementations without direct OS support have the option of using a system call to obtain the memory they need, just as malloc() itself must do. In any case, the memory obtained is not itself the thread.
Note also that the difference between threading systems with and without OS support is orthogonal to the threading API. For example, Linux's green threads and the now-ubiquitous, kernel-supported NPTL threads both implement the POSIX thread API.
It is expected that threads, on which pthread_detach() was not called, should be pthread_join()ed before the main thread returns from main() or calls exit().
However, what happens when this requirement is not met? What happens when a process terminates when it still contains unjoined and not detached threads?
I would find it odd to learn that these other threads’ resources will not be reclaimed until system reboot. However, if these resources will be reclaimed, then there may be little need to bother about joining or detaching, mightn’t it?
It is up to the operating system. Typical modern operating systems will indeed reclaim the memory and descriptors (handles) used by abandoned threads. This is similar to how dynamically allocated memory works: typical modern systems will reclaim it when a process exits, even if the process never explicitly freed the memory. For certain unusual programs, this can be a meaningful performance optimization, because freeing lots of small resources takes time and the OS may be able to do it more quickly.
However, what happens when this requirement is not met? What happens when a process terminates when it still contains unjoined and not detached threads?
On any system with POSIX threads that is not ancient, the non-joined threads simply "evaporate" into space when the SYS_exit system call is performed by the main thread.
I would find it odd to learn that these other threads’ resources will not be reclaimed until system reboot.
They will be.
However, if these resources will be reclaimed, then there may be little need to bother about joining or detaching, mightn’t it?
It depends on what these threads do. The danger is at-exit data races.
In C++, global variables are destructed (usually via atexit or equivalent registration mechanism), FILE handles are deleted, etc. etc.
If non-joined thread tries to access any such resource, it will likely crash with SIGSEGV, possibly producing core dump, and an unclean process exit code, both of which are often quite undesirable.
Is it a bad practice to free memory across threads? Such that a thread allocates memory and, after exiting, passes the pointer to the main thread to free the memory. I feel like the answer is yes but I'm just wondering.
The purpose of this in my code is so that the main thread can do some other stuff with the memory before it gets freed. There's plenty of workarounds, in my case, which I'm totally fine with using. But having a thread return void * to a block of memory can, in my case, make the code pretty convenient.
EDIT: I know there are no technical faults in doing this.
It's not wrong for a thread to pass control of memory it has allocated to another thread. For example, in a producer/consumer model, it would be very reasonable for the producer thread to allocate memory for whatever it is that it produces, and then hand control over that memory to the consumer thread for the consumer thread to use and release.
It's not "bad practice" as long as it makes sense to your data flow model, and particular to the requirements your program has on object lifetimes, but it can incur costs. Many modern allocators use thread-local arenas, where allocating and freeing an object in the same thread incurs no synchronization penalty, but freeing it in a different thread forces synchronization or incurs other costs. I would not change your design for this reason unless it's a major bottleneck, but with this implementation-detail in mind you could also consider other designs, such as having the thread store its output in a buffer provided by the parent thread in the argument to the thread start function.
All threads share a common heap. It doesn't matter which thread allocates or frees the memory, as long as the other threads are done using the memory when it gets freed.
Dynamic memory usage comes with a responsibility that you are in complete control of it. It is the user’s responsibility to explicitly manage the lifetime of the dynamically allocated object and ensure its deallocation once the expected lifetime of the object ends. There is nothing wrong in dynamically allocated memory blocks used across different threads. All the threads in the same process share the same heap area. The only care that one needs to take care is that the object lifetimes are clearly well defined and scoped.
I experience some memory allocation problems and try to detect possible reasons for these problems.
There are many possible reasons, and lots of hours must be spent to check each of them.
One of the possible reasons is that there is a memory buffer, that is allocated within a thread, and this buffer is used after the thread terminates.
So, if there is a chance that thread termination causes memory deallocation, then many hours of debugging may be avoided.
Thank you very much in advance.
I don't think it does, although it of course might depend on your particular details.
Generally, memory allocation from the operating system's point of view is a per-process activity, while threads exist inside the process. So if one thread allocates memory and then dies, the operating system doesn't clean that up since the process is still alive. Memory is shared inside the process, so the OS can't know that the memory no longer is used and can be cleaned up.
No, threads that 'die' do not deallocate any memory.
When a thread ends, the thread itself vanishes from memory, like a function does once it's done executing. It will take all the 'stack' objects with it, but all the memory you allocated yourself (i.e. malloc) will still be there.
As such, before you end your thread, you should make sure that all dynamic memory that was used by the thread and is not needed any more is freed properly.
Anything on the thread's stack (a local variable, for example) becomes invalid when the thread ends. However, if the data is in the heap, then the memory is still valid as long as the process is running. Of course, you'll need to save the pointer to that heap allocation somewhere outside that thread.
Memory allocated by a thread behaves like memory allocated by a method call:
variables on the stack will be dealocated when the method returns (thread terminates)
variables on the heap will continue to be allocated unless explicitly deallocated.
In addition to all answers, I'd like to make a note that pthread has a TLS keys which are registered with pthread_key_create which accepts key ID and destructor functions. On pthread_exit a static pthread_key_clean_all() is called that iterates through the keys and invokes assigned destructors that may perform memory deallocation (by application design).
So, to understand that - search in your code all pthread_key_create invocations, check if a destructor assigned and the put breakpoints to all of them to check what and in which order is destroyed.