Monitor not showing the right memory usage - c

I want to know how I can use the resource monitor, any kind, htop top, etc. to track the memory usage of a processes. Let's write a simple C program.
int main() {
while(1){}
return 0;
}
After the compilation, the executable output a.out is only 16Kb
$ ls -lah ./a.out [8:43:44]
-rwxr-xr-x 1 user staff 16K May 17 08:43 ./a.out
As I understood, the code has no variable, no malloc and all kinds of statement that requires any additional memory usage other than the code itself, which will be loaded to the memory when running. Some additional memory for stack pointer, frame pointer, etc. is expected but shouldn't be too much.
Interestingly, when I run the code. The System Monitor gives a very different opinion.
So I am using MacOS, the monitor states that the Virtual Memory usage is 30Gb+!
Okey?! Maybe this is due to some optimization, or some unique technique that MacOS manages memory. Let's try running that in a Ubuntu Virtual Machine with 1Gb memory.
I know this looks more reasonable than 30Gb, but 2356Kb?
Am I looking at the wrong indicator?

As I understood, the code has no variable, no malloc and all kinds of statement that requires any additional memory usage other than the code itself, which will be loaded to the memory when running.
Your code doesn't have much; but your code is typically linked with some startup code that does things like preprocess command line arguments, initialize parts of the C library, and call your main().
You'll also have a stack (e.g. so that the startup code can call your main()) that consumes memory (whether you use it or not).
When your program is started the executable loader will also "load" (map into your virtual address space) any shared libraries (e.g. C standard library, that's likely needed by the startup code you didn't write, even if you don't use it yourself).
The other thing that can happen is that when the startup code initializes the C standard library, the C standard library can initialize the heap (for things like malloc()), and something (the rest of C standard library initialization, the remainder of the startup code) could use malloc() even though the code you didn't write doesn't use it.
Of course operating systems/virtual memory management uses pages; so the size of each of your program's sections (.text, .data, etc), each section in each shared library, your stack, your heap, etc; are rounded up to the page size. Depending on which computer it is, page size might be 4 KiB (16 KiB for recent ARM/M1 Apple machines); and if the startup code you didn't create wants 1 byte in the .data section it costs 4 KiB (or 16 KiB) of memory.
So I am using MacOS, the monitor states that the Virtual Memory usage is 30Gb+!
I'd guess that most of it is space that was allocated for heap; where a tiny amount of the space is used and most isn't. If you assume that there's 176 KiB of private memory (used by your program and its startup code) and 440 KiB of shared memory (used by shared libraries), and assume that "32.54 GiB" is 3412000000 KiB; then maybe it's "3412000000 - (176 + 440) = 3411999384 KiB of space that was allocated but isn't actually being used".
I know this looks more reasonable than 30Gb, but 2356Kb?
Continuing the assumption that it's mostly "allocated but not used" heap space; it's good to understand how heap works. "Allocated but not used" space costs almost nothing, but asking the OS to allocate space (e.g. because the program actually used it all and ran out of "allocated but not used" space) involves some overhead. For this reason the C library tends to ask the OS for large pieces of "allocated but not used" space (to minimize the overhead by reducing the chance of needing to ask the OS for more space) and then splits it into tiny pieces when you call malloc().
With this in mind; and not forgetting that the startup code and libraries are "generic" and not likely to by optimized specifically for any one program; you can say that the best size for the heap's "allocated but not used" space is impossible to determine, but ranges from "maybe too small but it doesn't matter much" to "maybe too big but nobody cares". Different compilers and/or libraries and/or operating systems make different decisions; so the amount of "allocated but not used" space varies.
Am I looking at the wrong indicator?
I don't know (it depends on why you're looking at memory stats to begin with).
On modern machines the total virtual address space may be 131072 GiB (where most is "not allocated"), so if you're worried that "allocated but not used" space is going to cause you to run out of "not allocated" space later then you're looking at the right indicator.
Typically people care more about (some subset of) "allocated and actually used space" though.
If you're worried about consuming too much actual RAM (e.g. worried about increasing the chance that swap space will be used by the OS, which could reduce performance of all software and not just yours) then you'd want to look at the "Real Memory Size"; but I suspect that this includes shared memory (which would be used by many programs and not just your program).

Related

What is the main origin of heap and stack memory division?

I read a lot of explanation of heap and stack memory, and all of them obscure anyway in terms of origin. First of all I understand how this memories works with software, but I don't understand the main source of this division. I assume that they are the same unspecialized physical memory, but...
For example say we have PC without any OS, and we want create some bootable program with assembly language for x86. I assume we can do this (Personally I don't know assembly, but some people write OS anyway). So the main question is Can we already operate with heap and stack, or we must create some memory managment machinery for this? If yes, so how it can be possible in terms of bare metal programming?
Adding something to the other answer, fairly correct but perhaps not very complete.
Heap and stack are two (software) ways to "manage" memory. The physical memory, normally, is a flat array of cells where a program can read and write. It is up to the running program to use those cells as it wants. But there is more to say.
1^ thing. Heap is totally software, while stack is also (or mainly) a hardware thing. Most processors have hardware (or CPU instruction) to support the stack, while most (or all?) don't care about the heap. Even more: there are small embedded processors (or microcontrollers) which have a separated stack area - totally different from other ram areas where the program could create a "heap".
2^ thing. Whean speaking about "programs", one can/should think that the operating system (the OS) is a program, specialized in managing resources (memory included), and extendable with "applications" (which are programs). In such scenario, stack and heap are managed in cooperation from both OS and the applications.
So, to reply to your main question, the 90% correct answer is: in bare metal you have already a stack - perhaps you have to issue some short instruction to set it up, but it is straightforward. But you don't have a heap, you must implement it in your program. First you set aside some memory to be used as a stack; and then you can set aside some more memory to be used as a heap, not forgetting that you must preserve some memory for normal/static data. The part of the program that manages the heap should know what to do, using but not erratically overwriting the stack and the static data, to perform its functions.

Does a compiler have consider the kernel memory space when laying out memory?

I'm trying to reconcile a few concepts.
I know of virtual memory is shared (mapped) between the kernel and all user processes, which I read here. I also know that when the compiler generates addresses for code + data, the kernel must load them at the correct virtual addresses for that process.
To constrain the scope of the question, I'll just mean gcc when I mention 'the compiler'.
So does the compiler need to be compliant each new release of an OS, to know not to place code or data at the high memory addresses reserved for the kernel? As in, someone writing that piece of the compiler must know those details of how the kernel plans to load the program (lest the compiler put executable code in high memory)?
Or am I confusing different concepts? I got a bit confused when going through this tutorial, especially at the very bottom where it has OS code in low memory addresses, because I thought Linux uses high memory for the kernel.
The compiler doesn't determine the address ranges in memory at which things are placed. That's handled by the OS.
When the program is first executed, the loader places the various portions of the program and its libraries in memory. For memory that's allocated dynamically, large chunks are allocated from the OS and then sometimes divided into smaller chunks.
The OS loader knows where to load things. And the OS's virtual memory allocation logic how to find safe, empty spaces in the address space the process uses.
I'm not sure what you mean by the "high memory addresses reserved for the kernel". If you're talking about a 2G/2G or 3G/1G split on a 32-bit operating system, that is a fundamental design element of those OSes that use it. It doesn't change with versions.
If you're talking about high physical memory, then no. Compilers don't care about physical memory.
Linux gives each application its own memory space, distinct from the kernel. The page table contains the translations between this memory space and physical RAM, and the kernel sets up the page table so there's no interference.
That said, the compiler usually doesn't even care where the program is loaded in memory. Why would it?

How does malloc() know where the heap starts?

When the OS loads a process into memory it initializes the stack pointer to the virtual address it has decided where the stack should go in the process's virtual address space and program code uses this register to know where stack variables are. My question is how does malloc() know at what virtual address the heap starts at? Does the heap always exist at the end of the data segment, if so how does malloc() know where that is? Or is it even one contiguous area of memory or just randomly interspersed with other global variables in the data section?
malloc implementations are dependent on the operating system; so is the process that they use to get the beginning of the heap. On UNIX, this can be accomplished by calling sbrk(0) at initialization time. On other operating systems the process is different.
Note that you can implement malloc without knowing the location of the heap. You can initialize the free list to NULL, and call sbrk or a similar function with the allocation size each time a free element of the appropriate size is not found.
This only about Linux implementations of malloc
Many malloc implementations on Linux or Posix use the mmap(2) syscall to get some quite big range of memory. then they can use munmap(2) to release it.
(It looks like sbrk(2) might not be used a lot any more; in particular, it is not ASLR friendly and might not be multi-thread friendly)
Both these syscalls may be quite expansive, so some implementations ask memory (using mmap) in quite large chunks (e.g. in chunk of one or a few megabytes). Then they manage free space as e.g. linked lists of blocks, etc. They will handle differently small mallocs and large mallocs.
The mmap syscall usually does not start giving memory range at some fixed pieces (notably because of ASLR.
Try on your system to run a simple program printing the result of a single malloc (of e.g. 128 int-s). You probably will observe different addresses from one run to the next (because of ASLR). And strace(1)-ing it is very instructive. Try also cat /proc/self/maps (or print the lines of /proc/self/maps inside your program). See proc(5)
So there is no need to "start" the heap at some address, and on many systems that does not make even any sense. The kernel is giving segments of virtual addresses at random pages.
BTW, both GNU libc and musl libc are free software. You should look inside the source code of their malloc implementation. I find that source code of musl libc is very readable.
On Windows, you use the Heap functions to get the process heap memory. The C runtime will allocate memory blocks on the heap using HeapAlloc and then use that to fulfil malloc requests.

Microcontroller memory allocation

I've been thinking for day about the following question:
In a common pc when you allocate some memory, you ask for it to the OS that keeps track of which memory segments are occupied and which ones are not, and don't let you mess around with other programs memory etc.
But what about a microcontroller, I mean a microcontroller doesn't have an operating system running so when you ask for a bunch of memory what is going on? you cannot simply acess the memory chip and acess a random place cause it may be occupied... who keeps track of which parts of memory are already occupied, and gives you a free place to store something?
EDIT:
I've programmed microcontrollers in C... and I was thinking that the answer could be "language independent". But let me be more clear: supose i have this program running on a microcontroller:
int i=0;
int d=3;
what makes sure that my i and d variables are not stored at the same place in memory?
I think the comments have already covered this...
To ask for memory means you have some operating system managing memory that you are mallocing from (using a loose sense of the term operating system). First you shouldnt be mallocing memory in a microcontroller as a general rule (I may get flamed for that statement). Can be done in cases but you are in control of your memory, you own the system with your application, asking for memory means asking yourself for it.
Unless you have reasons why you cannot statically allocate your structures or arrays or use a union if there are mutually exclusive code paths that might both want much or all of the spare memory, you can try to allocate dynamically and free but it is a harder system engineering problem to solve.
There is a difference between runtime allocation of memory and compile time. your example has nothing to do with the rest of the question
int i=0;
int d=3;
the compiler at compile time allocates two locations in .data one for each of those items. the linker and/or script manages where .data lives and what its limitations are on size, if .data is bigger than what is available you should get a linker warning, if not then you need to fix your linker commands or script to match your system.
runtime allocation is managed at runtime and where and how it manages the memory is determined by that library, even if you have plenty of memory a bad or improperly written library could overlap .text, .data, .bss and/or the stack and cause a lot of problems.
excessive use of the stack is also a pretty serious system engineering problem which coming from non-embedded systems is these days often overlooked because there is so much memory. It is a very real problem when dealing with embedded code on a microcontroller. You need to know your worst case stack usage, leave room for at least that much memory if you are going to have a heap to dynamically allocate, or even if you statically allocate.

Determine total memory usage of embedded C program

I would like to be able to debug how much total memory is being used by C program in a limited resource environment of 256 KB memory (currently I am testing in an emulator program).
I have the ability to print debug statements to a screen, but what method should I use to calculate how much my C program is using (including globals, local variables [from perspective of my main function loop], the program code itself etc..)?
A secondary aspect would be to display the location/ranges of specific variables as opposed to just their size.
-Edit- The CPU is Hitachi SH2, I don't have an IDE that lets me put breakpoints into the program.
Using the IDE options make the proper actions (mark a checkobx, probably) so that the build process (namely, the linker) will generate a map file.
A map file of an embedded system will normally give you the information you need in a detailed fashion: The memory segments, their sizes, how much memory is utilzed in each one, program memory, data memory, etc.. There is usually a lot of data supplied by the map file, and you might need to write a script to calculate exactly what you need, or copy it to Excel. The map file might also contain summary information for you.
The stack is a bit trickier. If the map file gives that, then there you have it. If not, you need to find it yourself. Embedded compilers usually let you define the stack location and size. Put a breakpoint in the start of you program. When the application stops there zero the entire stack. Resume the application and let it work for a while. Finally stop it and inspect the stack memory. You will see non-zero values instead of zeros. The used stack goes until the zeros part starts again.
Generally you will have different sections in mmap generated file, where data goes, like :
.intvect
.intvect_end
.rozdata
.robase
.rosdata
.rodata
.text .... and so on!!!
with other attributes like Base,Size(hex),Size(dec) etc for each section.
While at any time local variables may take up more or less space (as they go in and out of scope), they are instantiated on the stack. In a single threaded environment, the stack will be a fixed allocation known at link time. The same is true of all statically allocated data. The only run-time variable part id dynamically allocated data, but even then sich data is allocated from the heap, which in most bare-metal, single-threaded environments is a fixed link-time allocation.
Consequently all the information you need about memory allocation is probably already provided by your linker. Often (depending on your tool-chain and linker parameters used) basic information is output when the linker runs. You can usually request that a full linker map file is generated and this will give you detailed information. Some linkers can perform stack usage analysis that will give you worst case stack usage for any particular function. In a single threaded environment, the stack usage from main() will give worst case overall usage (although interrupt handlers need consideration, the linker is not thread or interrupt aware, and some architectures have separate interrupt stacks, some are shared).
Although the heap itself is typically a fixed allocation (often all the available memory after the linker has performed static allocation of stack and static data), if you are using dynamic memory allocation, it may be useful at run-time to know how much memory has been allocated from the heap, as well as information about the number of allocations, average size of allocation, and the number of free blocks and their sizes also. Because dynamic memory allocation is implemented by your system's standard library any such analysis facility will be specific to your library, and may not be provided at all. If you have the library source you could implement such facilities yourself.
In a multi-threaded environment, thread stacks may be allocated statically or from the heap, but either way the same analysis methods described above apply. For stack usage analysis, the worst-case for each thread is measured from the entry point of each thread rather than from main().

Resources