I have been looking into reducing the memory footprint of an application. Following on from a previous question: GDB - can I find large data elements in memory I have found and removed most of the biggest culprits.
nm --size-sort was invaluable finding the large items from the .bss section of the executables.
The memory footprint as viewed in pmap has dropped very substantially. But while continuing this work on another system (Ubuntu Pangolin, gcc 4.6.3), I have noticed the memory footprint of running processes is perfectly reasonable, and certainly much smaller than the .bss size.
Running the code through the debugger, it looks like the biggest symbols from the .bss section are not really being allocated until the data is accessed (i.e. I can set an array
element from one of the big symbols, and the memory footprint grows by 16MB).
The .bss section is just zero-initialised, so it is easy to imagine an implementation assigning virtual address space to it, but not actually assigning any real memory until it is used.
Is this a real difference in behaviour, or a difference in reporting between systems?
In Linux zero-initialized pages are all mapped to the same "zeroed" physical page in memory. Using a copy-on-write method, a page is copied and re-mapped to a new page when you write to the memory of that page, which in turn causes the memory footprint of the application to grow. Sounds like this is what is happening, as you suspect. This would hold for all Linux distros.
Related
I want to know how I can use the resource monitor, any kind, htop top, etc. to track the memory usage of a processes. Let's write a simple C program.
int main() {
while(1){}
return 0;
}
After the compilation, the executable output a.out is only 16Kb
$ ls -lah ./a.out [8:43:44]
-rwxr-xr-x 1 user staff 16K May 17 08:43 ./a.out
As I understood, the code has no variable, no malloc and all kinds of statement that requires any additional memory usage other than the code itself, which will be loaded to the memory when running. Some additional memory for stack pointer, frame pointer, etc. is expected but shouldn't be too much.
Interestingly, when I run the code. The System Monitor gives a very different opinion.
So I am using MacOS, the monitor states that the Virtual Memory usage is 30Gb+!
Okey?! Maybe this is due to some optimization, or some unique technique that MacOS manages memory. Let's try running that in a Ubuntu Virtual Machine with 1Gb memory.
I know this looks more reasonable than 30Gb, but 2356Kb?
Am I looking at the wrong indicator?
As I understood, the code has no variable, no malloc and all kinds of statement that requires any additional memory usage other than the code itself, which will be loaded to the memory when running.
Your code doesn't have much; but your code is typically linked with some startup code that does things like preprocess command line arguments, initialize parts of the C library, and call your main().
You'll also have a stack (e.g. so that the startup code can call your main()) that consumes memory (whether you use it or not).
When your program is started the executable loader will also "load" (map into your virtual address space) any shared libraries (e.g. C standard library, that's likely needed by the startup code you didn't write, even if you don't use it yourself).
The other thing that can happen is that when the startup code initializes the C standard library, the C standard library can initialize the heap (for things like malloc()), and something (the rest of C standard library initialization, the remainder of the startup code) could use malloc() even though the code you didn't write doesn't use it.
Of course operating systems/virtual memory management uses pages; so the size of each of your program's sections (.text, .data, etc), each section in each shared library, your stack, your heap, etc; are rounded up to the page size. Depending on which computer it is, page size might be 4 KiB (16 KiB for recent ARM/M1 Apple machines); and if the startup code you didn't create wants 1 byte in the .data section it costs 4 KiB (or 16 KiB) of memory.
So I am using MacOS, the monitor states that the Virtual Memory usage is 30Gb+!
I'd guess that most of it is space that was allocated for heap; where a tiny amount of the space is used and most isn't. If you assume that there's 176 KiB of private memory (used by your program and its startup code) and 440 KiB of shared memory (used by shared libraries), and assume that "32.54 GiB" is 3412000000 KiB; then maybe it's "3412000000 - (176 + 440) = 3411999384 KiB of space that was allocated but isn't actually being used".
I know this looks more reasonable than 30Gb, but 2356Kb?
Continuing the assumption that it's mostly "allocated but not used" heap space; it's good to understand how heap works. "Allocated but not used" space costs almost nothing, but asking the OS to allocate space (e.g. because the program actually used it all and ran out of "allocated but not used" space) involves some overhead. For this reason the C library tends to ask the OS for large pieces of "allocated but not used" space (to minimize the overhead by reducing the chance of needing to ask the OS for more space) and then splits it into tiny pieces when you call malloc().
With this in mind; and not forgetting that the startup code and libraries are "generic" and not likely to by optimized specifically for any one program; you can say that the best size for the heap's "allocated but not used" space is impossible to determine, but ranges from "maybe too small but it doesn't matter much" to "maybe too big but nobody cares". Different compilers and/or libraries and/or operating systems make different decisions; so the amount of "allocated but not used" space varies.
Am I looking at the wrong indicator?
I don't know (it depends on why you're looking at memory stats to begin with).
On modern machines the total virtual address space may be 131072 GiB (where most is "not allocated"), so if you're worried that "allocated but not used" space is going to cause you to run out of "not allocated" space later then you're looking at the right indicator.
Typically people care more about (some subset of) "allocated and actually used space" though.
If you're worried about consuming too much actual RAM (e.g. worried about increasing the chance that swap space will be used by the OS, which could reduce performance of all software and not just yours) then you'd want to look at the "Real Memory Size"; but I suspect that this includes shared memory (which would be used by many programs and not just your program).
I'm trying to reconcile a few concepts.
I know of virtual memory is shared (mapped) between the kernel and all user processes, which I read here. I also know that when the compiler generates addresses for code + data, the kernel must load them at the correct virtual addresses for that process.
To constrain the scope of the question, I'll just mean gcc when I mention 'the compiler'.
So does the compiler need to be compliant each new release of an OS, to know not to place code or data at the high memory addresses reserved for the kernel? As in, someone writing that piece of the compiler must know those details of how the kernel plans to load the program (lest the compiler put executable code in high memory)?
Or am I confusing different concepts? I got a bit confused when going through this tutorial, especially at the very bottom where it has OS code in low memory addresses, because I thought Linux uses high memory for the kernel.
The compiler doesn't determine the address ranges in memory at which things are placed. That's handled by the OS.
When the program is first executed, the loader places the various portions of the program and its libraries in memory. For memory that's allocated dynamically, large chunks are allocated from the OS and then sometimes divided into smaller chunks.
The OS loader knows where to load things. And the OS's virtual memory allocation logic how to find safe, empty spaces in the address space the process uses.
I'm not sure what you mean by the "high memory addresses reserved for the kernel". If you're talking about a 2G/2G or 3G/1G split on a 32-bit operating system, that is a fundamental design element of those OSes that use it. It doesn't change with versions.
If you're talking about high physical memory, then no. Compilers don't care about physical memory.
Linux gives each application its own memory space, distinct from the kernel. The page table contains the translations between this memory space and physical RAM, and the kernel sets up the page table so there's no interference.
That said, the compiler usually doesn't even care where the program is loaded in memory. Why would it?
I've been thinking for day about the following question:
In a common pc when you allocate some memory, you ask for it to the OS that keeps track of which memory segments are occupied and which ones are not, and don't let you mess around with other programs memory etc.
But what about a microcontroller, I mean a microcontroller doesn't have an operating system running so when you ask for a bunch of memory what is going on? you cannot simply acess the memory chip and acess a random place cause it may be occupied... who keeps track of which parts of memory are already occupied, and gives you a free place to store something?
EDIT:
I've programmed microcontrollers in C... and I was thinking that the answer could be "language independent". But let me be more clear: supose i have this program running on a microcontroller:
int i=0;
int d=3;
what makes sure that my i and d variables are not stored at the same place in memory?
I think the comments have already covered this...
To ask for memory means you have some operating system managing memory that you are mallocing from (using a loose sense of the term operating system). First you shouldnt be mallocing memory in a microcontroller as a general rule (I may get flamed for that statement). Can be done in cases but you are in control of your memory, you own the system with your application, asking for memory means asking yourself for it.
Unless you have reasons why you cannot statically allocate your structures or arrays or use a union if there are mutually exclusive code paths that might both want much or all of the spare memory, you can try to allocate dynamically and free but it is a harder system engineering problem to solve.
There is a difference between runtime allocation of memory and compile time. your example has nothing to do with the rest of the question
int i=0;
int d=3;
the compiler at compile time allocates two locations in .data one for each of those items. the linker and/or script manages where .data lives and what its limitations are on size, if .data is bigger than what is available you should get a linker warning, if not then you need to fix your linker commands or script to match your system.
runtime allocation is managed at runtime and where and how it manages the memory is determined by that library, even if you have plenty of memory a bad or improperly written library could overlap .text, .data, .bss and/or the stack and cause a lot of problems.
excessive use of the stack is also a pretty serious system engineering problem which coming from non-embedded systems is these days often overlooked because there is so much memory. It is a very real problem when dealing with embedded code on a microcontroller. You need to know your worst case stack usage, leave room for at least that much memory if you are going to have a heap to dynamically allocate, or even if you statically allocate.
I'm trying to understand how C allocates memory to global variables.
I'm working on a simple Kernel. So far it can't do much more than print to screen and enable interrupts. I'm now working on a basic physical memory manager.
My memory manager is a bitmap that sets a 1 or 0 if memory is allocated or available. I need to add the memory that my Kernel is using to the bitmap as 'allocated', so nothing overwrites it.
I can easily find out the start of the Kernel, as it's statically loaded to 0x100000. Figuring out the length shouldn't be too difficult either. The part I'm not sure about is where global variables are put in memory?
Let's say my Kernel is 12K, I can then allocate these 3x 4K blocks of memory to it for protection. Do I need to allocate more to cover the variables it uses? Or are the variables part of that 12K?
Thank you for your help, I hope I am making enough sense.
have a look at
http://www.geeksforgeeks.org/archives/14268
your globals mostly are in the BSS
As the previous answer says, most variables are stored in the .bss section but they can also be stored in the .data or .rodata section depending on if you defined the global variables as static or const. After compiling you can use readelf -S kernel.bin to see exactly how much space each section will utilize. For the .bss section the memory is only occupied when the binary is loaded in memory and does not take any space on disk. This means that your compiled kernel binary will be smaller than the actual size it will later use when brought into memory (by grub usually).
A simple way to figure out exactly how much data your kernel will use besides using readelf is to place the .bss section inside the .data section within your linker script. The size of the kernel binary will then be the same size both on disk as in memory (or actually it will be a bit smaller in memory since not all sections are copied by grub) but then at least you know the minimum amount of memory you need to allocate.
I'd recommend using a custom linker script (assuming you use gcc): it makes the layout of kernel sections explicit and customizable (to read more about linker scripts, read info ld). You can see an example of my OS's linker script here.
To see the default linker script use -v/--verbose option of ld.
Mostly global variables are located in .data.* and .rodata.* sections, variables initialized with 0 go in .bss.
I would like to be able to debug how much total memory is being used by C program in a limited resource environment of 256 KB memory (currently I am testing in an emulator program).
I have the ability to print debug statements to a screen, but what method should I use to calculate how much my C program is using (including globals, local variables [from perspective of my main function loop], the program code itself etc..)?
A secondary aspect would be to display the location/ranges of specific variables as opposed to just their size.
-Edit- The CPU is Hitachi SH2, I don't have an IDE that lets me put breakpoints into the program.
Using the IDE options make the proper actions (mark a checkobx, probably) so that the build process (namely, the linker) will generate a map file.
A map file of an embedded system will normally give you the information you need in a detailed fashion: The memory segments, their sizes, how much memory is utilzed in each one, program memory, data memory, etc.. There is usually a lot of data supplied by the map file, and you might need to write a script to calculate exactly what you need, or copy it to Excel. The map file might also contain summary information for you.
The stack is a bit trickier. If the map file gives that, then there you have it. If not, you need to find it yourself. Embedded compilers usually let you define the stack location and size. Put a breakpoint in the start of you program. When the application stops there zero the entire stack. Resume the application and let it work for a while. Finally stop it and inspect the stack memory. You will see non-zero values instead of zeros. The used stack goes until the zeros part starts again.
Generally you will have different sections in mmap generated file, where data goes, like :
.intvect
.intvect_end
.rozdata
.robase
.rosdata
.rodata
.text .... and so on!!!
with other attributes like Base,Size(hex),Size(dec) etc for each section.
While at any time local variables may take up more or less space (as they go in and out of scope), they are instantiated on the stack. In a single threaded environment, the stack will be a fixed allocation known at link time. The same is true of all statically allocated data. The only run-time variable part id dynamically allocated data, but even then sich data is allocated from the heap, which in most bare-metal, single-threaded environments is a fixed link-time allocation.
Consequently all the information you need about memory allocation is probably already provided by your linker. Often (depending on your tool-chain and linker parameters used) basic information is output when the linker runs. You can usually request that a full linker map file is generated and this will give you detailed information. Some linkers can perform stack usage analysis that will give you worst case stack usage for any particular function. In a single threaded environment, the stack usage from main() will give worst case overall usage (although interrupt handlers need consideration, the linker is not thread or interrupt aware, and some architectures have separate interrupt stacks, some are shared).
Although the heap itself is typically a fixed allocation (often all the available memory after the linker has performed static allocation of stack and static data), if you are using dynamic memory allocation, it may be useful at run-time to know how much memory has been allocated from the heap, as well as information about the number of allocations, average size of allocation, and the number of free blocks and their sizes also. Because dynamic memory allocation is implemented by your system's standard library any such analysis facility will be specific to your library, and may not be provided at all. If you have the library source you could implement such facilities yourself.
In a multi-threaded environment, thread stacks may be allocated statically or from the heap, but either way the same analysis methods described above apply. For stack usage analysis, the worst-case for each thread is measured from the entry point of each thread rather than from main().