Do shared libraries use the same heap as the application? - c

Say I have an application in Linux that uses shared libraries (.so files). My question is whether the code in those libraries will allocate memory in the same heap as the main application or do they use their own heap?
So for example, some function in the .so file calls malloc, would it use the same heap manager as the application or another one? Also, what about the global data in those shared memories. Where does it lie? I know for the application it lies in the bss and data segment, but don't know where it is for those shared object files.

My question is whether the code in those libraries will allocate memory in the same heap as the main application or do they use their own heap?
If the library uses the same malloc/free as the application (e.g. from glibc) - then yes, program and all libraries will use the single heap.
If library uses mmap directly, it can allocate memory which is not the memory used by program itself.
So for example, some function in the .so file calls malloc, would it use the same heap manager as the application or another one?
If function from .so calls malloc, this malloc is the same as malloc called from program. You can see symbol binding log in Linux/glibc (>2.1) with
LD_DEBUG=bindings ./your_program
Yes, several instances of heap managers (with default configuration) can't co-exist without knowing about each other (the problem is with keeping brk-allocated heap size synchronized between instances). But there is a configuration possible when several instances can co-exist.
Most classic malloc implementations (ptmalloc*, dlmalloc, etc) can use two methods of getting memory from the system: brk and mmap. Brk is the classic heap, which is linear and can grow or shrink. Mmap allows to get lot of memory in anywhere; and you can return this memory back to the system (free it) in any order.
When malloc is builded, the brk method can be disabled. Then malloc will emulate linear heap using only mmaps or even will disable classic linear heap and all allocations will be made from discontiguous mmaped fragmens.
So, some library can have own memory manager, e.g. malloc compiled with brk disabled or with non-malloc memory manager. This manager should have function names other than malloc and free, for example malloc1 and free1 or should not to show/export this names to dynamic linker.
Also, what about the global data in those shared memories. Where does it lie? I know for the application it lies in the bss and data segment, but don't know where it is for those shared object files.
You should think both about program and .so just as ELF files. Every ELF file has "program headers" (readelf -l elf_file). The way how data is loaded from ELF into memory depends on program header's type. If the type is "LOAD", corresponding part of file will be privately mmaped (Sic!) to memory. Usually, there are 2 LOAD segments; first one for code with R+X (read+execute) flags and second is for data with R+W (read+write) flags. Both .bss and .data (global data) sections are placed in the segment of type LOAD with Write enabled flag.
Both executable and shared library has LOAD segments. Some of segments has memory_size > file_size. It means that segment will be expanded in memory; first part of it will be filled with data from ELF file, and the second part of size (memory_size-file_size) will be filled with zero (for *bss sections), using mmap(/dev/zero) and memset(0)
When Kernel or Dynamic linker loads ELF file into memory, they will not think about sharing. For example, you want to start same program twice. First process will load read-only part of ELF file with mmap; second process will do the same mmap (if aslr is active - second mmap will be into different virtual address). It is task of Page cache (VFS subsystem) to keep single copy of data in physical memory (with COPY-on-WRITE aka COW); and mmap will just setup mappings from virtual address in each process into single physical location. If any process will change a memory page; it will be copied on write to unique private physical memory.
Loading code is in glibc/elf/dl-load.c (_dl_map_object_from_fd) for ld.so and linux-kernel/fs/binfmt_elf.c for kernel's ELF loader (elf_map, load_elf_binary). Do a search for PT_LOAD.
So, global data and bss data is always privately mmaped in each process, and they are protected with COW.
Heap and stack are allocated in run-time with brk+mmap (heap) and by OS kernel automagically in brk-like process (for stack of main thread). Additional thread's stacks are allocated with mmap in pthread_create.

Symbol tables are shared across an entire process in Linux. malloc() for any part of the process is the same as all the other parts. So yes, if all parts of a process access the heap via malloc() et alia then they will share the same heap.

Related

About memory allocation, does C malloc/calloc depends on Linux mmap/malloc or the opposite?

As far as I know, C has the following functions, e.g: malloc, calloc, realloc, to allocate memory. And the linux kernel also has the following functions, e.g: malloc, mmap, kmalloc, vmalloc... to allocate memory.
I want to know which is the lowest function.
If you say, "Linux kernel is the lowest function, your C program must allocate memory with Linux kernel", then how does the Linux kernel allocate it's own memory?
Or if you say, "Linux kernel is the lowest function.", then when I write a C program and run in the Linux system, to allocate memory, I should through the system call.
Hope to have an answer.
I want to know which is the lowest function.
The user-level malloc function calls brk or malloc (depending on the library used and depending on the Linux version).
... how does the Linux kernel allocate it's own memory?
On a system without MMU this is easy:
Let's say we have system with 8 MB RAM and we know that the address of the RAM is 0x20000000 to 0x20800000.
The kernel contains an array that contains information about which pages are in use. Let's say the size of a "page" is 0x1000 (this is the page size in x86 systems with MMU).
In old Linux versions the array was named mem_map. Each element in the array corresponds to one memory "page". It is zero if the page is free.
When the system is started, the kernel itself initializes this array (writes the initial values in the array).
If the kernel needs one page of memory, it searches for an element in the array whose value is 0. Let's say mem_map[0x123] is 0. The kernel now sets mem_map[0x123]=1;. mem_map[0x123] corresponds to the address 0x20123000, so the kernel "allocated" some memory at address 0x20123000.
If the kernel wants to "free" some memory at address 0x20234000, it simply sets mem_map[0x234]=0;.
On a system with MMU, it is a bit more complicated but the principle is the same.
On the linux OS, the C functions malloc, calloc, realloc used by user mode programs are implemented in the C library and handle pages of memory mapped in the process address space using the mmap system call. mmap associates pages of virtual memory with addresses in the process address space. When the process then accesses these addresses, actual RAM is mapped by the kernel to this virtual space. Not every call to malloc maps memory pages, only those for which not enough space was already requested from the system.
In the kernel space, a similar process takes place but the caller can require that the RAM be mapped immediately.

It's possible to share data-section (veriables) between twice `dlopen`ed sahrdLib in the same process? [duplicate]

I can't seem to find an answer after searching for this out on the net.
When I use dlopen the first time it seems to take longer than any time after that, including if I run it from multiple instances of a program.
Does dlopen load up the so into memory once and have the OS save it so that any following calls even from another instance of the program point to the same spot in memory?
So basically does 3 instances of a program running a library mean 3 instances of the same .so are loaded into memory, or is there only one instance in memory?
Thanks
Does dlopen load up the so into memory once and have the OS save it so that any following calls even from another instance of the program point to the same spot in memory?
Multiple calls to dlopen from within a single process are guaranteed to not load the library more than once. From the man page:
If the same shared object is loaded again with dlopen(), the same
object handle is returned. The dynamic linker maintains reference
counts for object handles, so a dynamically loaded shared object is
not deallocated until dlclose() has been called on it as many times
as dlopen() has succeeded on it.
When the first call to dlopen happens, the library is mmaped into the calling process. There are usually at least two separate mmap calls: the .text and .rodata sections (which usually reside in a single RO segment) are mapped read-only, the .data and .bss sections are mapped read-write.
A subsequent dlopen from another process performs the same mmaps. However the OS does not have to load any of the read-only data from disk -- it merely increments reference counts on the pages already loaded for the first dlopen call. That is the sharing in "shared library".
So basically does 3 instances of a program running a library mean 3 instances of the same .so are loaded into memory, or is there only one instance in memory?
Depends on what you call an "instance".
Each process will have its own set of (dynamically allocated) runtime loader structures describing this library, and each set will contain an "instance" of the shared library (which can be loaded at different address in different process). Each process will also have its own instance of writable data (which uses copy-on-write semantics). But the read-only mappings will all occupy the same physical memory (though they can appear at different addresses in each of the processes).

How is memory layout shared with other processes/threads?

I'm currently learning memory layout in C. For now I know there exist several sections in C program memory: text, data, bss, heap and stack. They also say heap is shared with other things beyond the program.
My questions are these.
What exactly is the heap shared with? One source states that Heap must always be freed in order to make it available for other processes whereas another says The heap area is shared by all threads, shared libraries, and dynamically loaded modules in a process. If it is not shared with other processes, do I really have to free it while my program is running (not at the end of it)?
Some sources also single out high addresses (the sixth section) for command line arguments and environment variables. Shall this be considered as another layer and a part of a program memory?
Are the other sections shared with anything else beyond a program?
The heap is a per-process memory: each process has its own heap, which is shared only within the same process space (like between the process threads, as you said). Why should you free it? Not properly to give space to other processes (at least in modern OS where the process memory is reclaimed by the OS when the process dies), but to prevent heap exhaustion within your process memory: in C, if you don't deallocate the heap memory regions you used, they will be always considered as busy even when they are not used anymore. Thus, to prevent undesired errors, it's a good practice to free the memory in the heap as soon as you don't need it anymore.
In a C program the command line variables are stored in the stack as function variables of the main. What happens is that usually the stack is allocated in the highest portion of a process memory, which is mapped to the high addresses (this is probably the reason why some sources point out what you wrote). But, generally speaking, there isn't any sixth memory area.
As said by the others, the text area can be shared by processes. This area usually contains the binary code, which would be the same for different processes which share the same binary. For performance reasons, the OS can allow to share such memory area, (think for example when you fork a child process).
Heap is shared with other processes in a sense that all processes use RAM. The more of it you use, the less is available to other programs. Heap sharing with other threads in your own program means that all your threads actually see and access the same heap (same virtual address space, same actual RAM, with some luck also same cache).
No.
text can be shared with other processes. These days it is marked as read-only, so having several processes share text makes sense. In practice this means that if you are already running top and you run another instance it makes no sense to load text part again. This would waste time and physical RAM. If the OS is smart enough it can map those RAM pages into virtual address space of both top instances, saving time and space.
On the official aspect:
The terms thread, process, text section, data section, bss, heap and stack are not even defined by the C language standard, and every platform is free to implement these components however "it may like".
Threads and processes are typically implemented at the operating-system layer, while all the different memory sections are typically implemented at the compiler layer.
On the practical aspect:
For every given process, all these memory sections (text section, data section, bss, heap and stack) are shared by all the threads of that process.
Hence, it is under the responsibility of the programmer to ensure mutual-exclusion when accessing these memory sections from different threads.
Typically, this is achieved via synchronization utilities such as semaphores, mutexes and message queues.
In between processes, it is under the responsibility of the operating system to ensure mutual-exclusion.
Typically, this is achieved via virtual-memory abstraction, where each process runs inside its own logical address space, and each logical address space is mapped to a different physical address space.
Disclaimer: some would claim that each thread has its own stack, but technically speaking, those stacks are usually allocated consecutively on the stack of the process, and there's usually no one to prevent a thread from accessing the stacks of other threads, whether intentionally or by mistake (aka stack overflow).

Dynamic Library Injection

Where does an injected DLL reside in memory?
My understanding of DLLs is that they get loaded once in physical memory so that multiple files can use their functions, thus saving resources by having reusable code loaded once instead of in every single process. The DLL's functions are located outside the process' address space and can be found via the memory mapping segment in between the heap and stack segments. For each process the memory map segment is located at a different virtual address, however the libraries/functions the memory mapped segment points to must be at the same address for each process otherwise they couldn't be shared between processes (assuming my understanding of this is all correct).
So when it's said that a DLL is loaded into a process' address space, is it loaded into the memory of that process, or somewhere else in the system and the process adds to the memory map to locate it? And on that note, what allows an injected library to migrate from one process to another?

process allocated memory blocks from kernel

i need to have reliable measurement of allocated memory in a linux process. I've been looking into mallinfo but i've read that it is deprecated. What is the state of the art alternative for this sort of statistics?
basically i'm interested in at least two numbers:
number (and size) of allocated memory blocks/pages from the kernel by any malloc or whatever implementation uses the C library of choice
(optional but still important) number of allocated memory by userspace code (via malloc, new, etc.) minus the deallocated memory by it (via free, delete, etc.)
one possibility i have is to override malloc calls with LD_PRELOAD, but it might introduce an unwanted overhead at runtime, also it might not interact properly with other libraries i'm using that also rely on LD_PRELOAD aop-ness.
another possibility i've read is with rusage.
To be clear, this is NOT for debugging purposes, the memory usage is intrinsic feature of the application (similar to Mathematica or Matlab that display the amount of memory used, only that more precise at the block-level)
For this purpose - a "memory usage" introspection feature within an application - the most appropriate interface is malloc_hook(3). These are GNU extensions that allow you to hook every malloc(), realloc() and free() call, maintaining your statistics.
To see how much memory is mapped by your application from the kernel's point of view, you can read and collate the information in the /proc/self/smaps pseudofile. This also lets you see how much of each allocation is resident, swapped, shared/private, clean/dirty etc.
/proc/PID/status contains a few useful pieces of information (try running cat /proc/$$/status for example).
VmPeak is the largest your process's virtual memory space ever became during its execution. This includes all pages mapped into your process, including executable pages, mmap'ed files, stack, and heap.
VmSize is the current size of your process's virtual memory space.
VmRSS is the Resident Set Size of your process; i.e., how much of it is taking up physical RAM right now. (A typical process will have lots of stuff mapped that it never uses, like most of the C library. If no processes need a page, eventually it will be evicted and become non-resident. RSS measures the pages that remain resident and are mapped into your process.)
VmHWM is the High Water Mark of VmRSS; i.e. the highest that number has been during the lifetime of the process.
VmData is the size of your process's "data" segment; i.e., roughly its heap usage. Note that small blocks on which you have done malloc and then free will still be in use from the kernel's point of view; large blocks will actually be returned to the kernel when freed. (If memory serves, "large" means greater than 128k for current glibc.) This is probably the closest to what you are looking for.
These measurements are probably better than trying to track malloc and free, since they indicate what is "really going on" from a system-wide point of view. Just because you have called free() on some memory, that does not mean it has been returned to the system for other processes to use.

Resources