How much memory takes a C program [closed] - c

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am developping an in memory database as a side project which is supposed to be lightweight. I haven't been promming in C since school and my knowledge of computer architecure is limited...
I am wondering how can I calculate exactly how much memory my program will take and from which kind of memory (RAM, register, ... ).
The most obvious is everything I allocate through malloc. Sorry if the following questions are a bit random...
Global variables will be stored in RAM? Does the keyword static (to limit the scope) influence anything?
Are all global variable allocated at the same time or could it be lazy allocated on first access?
Is the executable loaded in memory? Does an executable of 1MB will take 1MB for the execution?
This subject is a pretty big one so don't hesitate to point me to a book or a website. I guess it's not only about C but more about the computer architecture, the assembly code etc.

I'm assuming typical computing platforms, not embedded systems.
Global variables will be stored in RAM? Does the keyword static (to limit the scope) influence anything?
Global variables will be stored in RAM only if the operating system thinks that's the best use for RAM. Scope has no effect.
Are all global variable allocated at the same time or could it be lazy allocated on first access?
It depends what you mean by "allocated". Typically virtual memory (address space) is allocated all at once, but physical memory (RAM) is allocated as needed.
Is the executable loaded in memory? Does an executable of 1MB will take 1MB for the execution?
It is mapped into memory at program start. It is actually loaded into physical memory as needed and evicted from physical memory as the OS deems appropriate.
I strongly suspect you are looking for simple answers to very complex questions.

Yes, but that doesn't mean they're all mapped at any given point in time.
They can't be lazily allocated, depending on what you mean by that. They will all mapped to virtual addresses, but then again if the program never accesses the variables the OS might never need to map those addresses to actual physical RAM.
It depends, but most modern desktop/server operating systems will page the code in as needed, I think.

Oups, that's an interesting question, but the answer is as usual : it depends !
Your questions are heavily implementation dependent. In old (now outdated) systems, existed the notion of overlays : parts of code were only loaded in memory when needed. I do not think it is still used with modern virtual memory systems, but it could have sense on embedded systems with llimited resources.
And some compilers generally have options to determine the size of the stack. It can be determinant for a lightweight program.
And there is obvious dependancy on architecture : on Unix-Linux, you have elf vs. a.out format with different memory requirement and management, on Windows, there is still the old .com format that can lead to really tiny executables.

Related

Why do MCU compilers for chips like AVR or ESP (used widely by Arduino) keep all strings in SRAM heap by default? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 months ago.
Improve this question
There is a common technique used in Arduino world, where you can use PROGMEM macros in order to keep strings and other similar data in flash memory instead of SRAM to keep lower RAM usage, while sacrificing some performance - https://www.arduino.cc/reference/en/language/variables/utilities/progmem/
Basically, instead of storing these in SRAM, there is just some reference to a FLASH address where the string is stored and loaded from on the fly, in order to save RAM.
But I can't understand why do MCU compilers put all strings including local strings from functions into heap memory and keep them there all the time in the first place. Also I don't understand how compiler can "store anything in RAM instead of flash" - RAM is volatile, so compiler can hardly "store" anything in there as it's cleared on every reset. These strings still must be present in program image stored on FLASH, so why does it copy them from FLASH to RAM on each launch of the MCU? I was thinking that maybe whole program image must be loaded into RAM for execution, but then that doesn't make sense as these chips use harvard architecture and program is executed from FLASH already (and most of these chips have much bigger FLASH than RAM anyway, so whole image would never fit into RAM).
While I understand how to use workarounds that prevent this behaviour, I can't understand why this behaviour exists in the first place. Can someone shed some light on it? Why are all strings loaded into HEAP on start of the program by default? Is that for performance reasons?
The AVR architecture is different from many other common architectures in that the the code and data exist in completely different memory spaces (though the program memory can be accessed as data, as shown in PROGMEM documentation page to which you linked). This is one type of modified Harvard architecture.
Most other architectures that you're likely to use present themselves to the user as having code and data exist in the same memory space. While this is often also done with a modified Harvard architecture, they present themselves to the user as a von Neumann architecture, having a unified code and data memory space.
On AVR, to make initialized global or static data available to use as any other in-memory data, part of the program startup code copies the initialization data from program memory into RAM. This is generally done to program segments with names like .data or .rodata, depending on whether or not the variables in question are const.
Note that, contrary to what you say in your question, this data is not copied to the heap, it's stored in some portion of RAM chosen during program linking.
Using PROGMEM and the associated functions, you can directly access the data stored in the flash memory of the AVR device. This constant data is placed in a segment that won't be copied to RAM on startup, like .progmem.data, and so doesn't have space in RAM reserved for it.
The case with the Xtensa architecture, used by the ESP8266 and some members of the ESP32 family, is completely different. Contrary to what you state in your question, I don't believe that static or global objects which are const are copied into RAM by default, only those which can be modified (the .data segment would be copied as initialization to RAM, while the .rodata segment would not be).

What is the main origin of heap and stack memory division?

I read a lot of explanation of heap and stack memory, and all of them obscure anyway in terms of origin. First of all I understand how this memories works with software, but I don't understand the main source of this division. I assume that they are the same unspecialized physical memory, but...
For example say we have PC without any OS, and we want create some bootable program with assembly language for x86. I assume we can do this (Personally I don't know assembly, but some people write OS anyway). So the main question is Can we already operate with heap and stack, or we must create some memory managment machinery for this? If yes, so how it can be possible in terms of bare metal programming?
Adding something to the other answer, fairly correct but perhaps not very complete.
Heap and stack are two (software) ways to "manage" memory. The physical memory, normally, is a flat array of cells where a program can read and write. It is up to the running program to use those cells as it wants. But there is more to say.
1^ thing. Heap is totally software, while stack is also (or mainly) a hardware thing. Most processors have hardware (or CPU instruction) to support the stack, while most (or all?) don't care about the heap. Even more: there are small embedded processors (or microcontrollers) which have a separated stack area - totally different from other ram areas where the program could create a "heap".
2^ thing. Whean speaking about "programs", one can/should think that the operating system (the OS) is a program, specialized in managing resources (memory included), and extendable with "applications" (which are programs). In such scenario, stack and heap are managed in cooperation from both OS and the applications.
So, to reply to your main question, the 90% correct answer is: in bare metal you have already a stack - perhaps you have to issue some short instruction to set it up, but it is straightforward. But you don't have a heap, you must implement it in your program. First you set aside some memory to be used as a stack; and then you can set aside some more memory to be used as a heap, not forgetting that you must preserve some memory for normal/static data. The part of the program that manages the heap should know what to do, using but not erratically overwriting the stack and the static data, to perform its functions.

Force memory allocation always to the same virtual address [duplicate]

This question already has answers here:
disable the randomness in malloc
(6 answers)
Closed 9 years ago.
I'm experimenting with Pin, an instrumentation tool, which I use to compute some statistics based on memory address of my variables. I want to re-run my program with the information gathered by my instrumentation tool, but for that it's crucial that virtual memory addresses remain the same through different runs.
In general, I should let the OS handle memory allocation, but in this case I need some kind of way to force it to always allocate to the same virtual address. In particular, I'm interested in a very long array, which I'm currently allocating with numa_alloc_onnode(), though I could use something else.
What would be the correct way to proceed?
Thanks
You could try mmap(2).
The instrumented version of your program will use a different memory layout than the original program because pin needs memory for the dynamic translation etc. and will change the memory layout. (if I recall correctly)
With the exception of address space layout randomization, most memory allocators, loaders, and system routines for assigning virtual memory addresses will return the same results given the same calls and data (not by deliberate design for that but by natural consequence of how software works). So, you need to:
Disable address space layout randomization.
Ensure your program executes in the same way each time.
Address space layout randomization is deliberate changes to address space to foil attackers: If the addresses are changed in each program execution, it is more difficult for attacks to use various exploits to control the code that is executed. It should be disabled only temporarily and only for debugging purposes. This answer shows one method of doing that and links to more information, but the exact method may depend on the version of Linux you are using.
Your program may execute differently for a variety of reasons, such as using threads or using asynchronous signals or interprocess communication. It will be up to you to control that in your program.
Generally, memory allocation is not guaranteed to be reproducible. The results you get may be on an as-is basis.

How to allocate more memory to your program( GCC) [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I want to allocate more memory to program.l What is the gcc flag that allows you to do so?
FYI what I am trying to do is create a very large matrix( really large) which is gonna go through compression algorithms later. So there is no way I can avoid creating such a large matrix to store data.
Your question is very unclear, but I suspect that you are trying to create a large multidimensional array (matrix) as a local variable (auto variable) to some function (possibly main) and this is failing.
int foo(int boo, int doo) {
int big_array[REALLY_BIG];
...
This would fail because C compilers try to make room for variables like this on the programs system stack. A compiler may just fail upon attempting to think about something that big being on the stack (especially with alignment issues that might make it bigger) or it may generate code to try to do this and either the CPU can't run it because stack pointer relative indexing is limited or because the OS has placed limits on the size of the program's system stack.
There may be ways to change OS limits, but if it is a CPU limit you are just going to have to do things differently.
For some things the simplest thing to do use just use global or static variables for large sized data such as this. Doing this you end up allocating the space for the data either at compile time or at program load time (just prior to run time), but limits your ability to have more than one copy since you have to plan ahead to declare enough global variables to hold everything you want to be live at the same time.
You could also try using malloc or calloc to allocate the memory for you.
A third option is (if you are using a *nix system) to memory map a file containing the matrix. Look into the mmap system call for this.
An added benefit of using mmap or static or global variables is that under most operating systems the virtual memory manager can use the original file (the file containing the matrix for mmap, or the executable file for static or global) as swap space for the memory that the data uses. This makes it so that your program may be able to run without putting too much of a strain on the physical memory or virtual memory manager.
If the matrix is really large you might have to allocate memory in smaller segments so it can find room in the virtual memory space. On 32-bit Windows I have found you simply cannot get anything bigger than about 980 MB in a single allocation. On Linux it is pushing it to try to get more than about 1.5 GB.
In a 64-bit system you can get a lot more.
But in any case, I would recommend using a matrix library that can handle the memory and algorithms for you. There are many subtle tricks to making fast matrix computations. Tricks with threads, computing in cache-sized blocks, prefetching data, SSE vector ops, etc.
You might want to look into using the math libraries from either Intel or AMD.
You don't need any special gcc flags.
Use malloc to allocate your array dynamically at runtime.
If you are somehow forced to use a static array, or if your environment is set up by default to limit your program's access to virtual memory, you may need to use the ulimit command.
ulimit -v unlimited
ulimit -d unlimited
Otherwise, you need to specify more clearly the error you are getting that prevents you from getting sufficient memory, and probably also tell us how big your matrix is.
Use the heap! malloc() and friends are your friends.

Linux C debugging library to detect memory corruptions [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
When working sometimes ago on an embedded system with a simple MMU, I used to program dynamically this MMU to detect memory corruptions.
For instance, at some moment at runtime, the foo variable was overwritten with some unexpected data (probably by a dangling pointer or whatever). So I added the additional debugging code :
at init, the memory used by foo was indicated as a forbidden region to the MMU;
each time foo was accessed on purpose, access to the region was allowed just before then forbidden just after;
a MMU irq handler was added to dump the master and the address responsible of the violation.
This was actually some kind of watchpoint, but directly self-handled by the code itself.
Now, I would like to reuse the same trick, but on a x86 platform. The problem is that I am very far from understanding how is working the MMU on this platform, and how it is used by Linux, but I wonder if any library/tool/system call already exist to deal with this problem.
Note that I am aware that various tools exist like Valgrind or GDB to manage memory problems, but as far as I know, none of these tools car be dynamically reconfigured by the debugged code.
I am mainly interested for user space under Linux, but any info on kernel mode or under Windows is also welcome!
You can use the mmap (MAP_ANONYMOUS) and mprotect functions to manipulate the virtual memory system and use the corresponding protection flags. Your variables need to be constrained to a multiple of the system page size of course. Lots of small variables will present a significant overhead.
Of course your application needs to work correctly when managing access rights to the memory regions. You also need to use mmap() instead of malloc for the protected regions.
This is the user space interface layer to the MMU, in a relatively portable fashion.
mmap and mprotect
Two good options:
dmalloc is a library that replaces malloc() and free() with extensive debugging versions, capable of using page boundaries to detect memory overruns/underruns, filling allocated and freed memory, leak-checking, and more.
valgrind is a memory debugger that allows very precise memory debugging (detecting accurately any out-of-bounds access) at the expense of program speed (programs run substantially slower under it). It can also do leak checking.
I think the best you're going to be able to do is to fire off a watchdog thread that keeps a copy of the value and continually compares its copy to the working value. You won't be able to catch exactly when the value is overwritten, but you'll be notified to within whatever granularity you want (i.e., if you set the thread to check every 10ms you'll be notified within 10ms).
The mprotect() system call is what you're after. This lets you change the protections on a memory region.
Memory protection on x86 under Linux is done at the level of a page - 4096 bytes. So you will have to arrange for your protected variable to live on its own page(s), not sharing with any other variables. One way to arrange for this is to use posix_memalign() to allocate the memory for the variable, using 4096 as the alignment and rounding up the size to the next multiple of 4096 (Actually, you can use sysconf(_SC_PAGESIZE) to determine the page size in a portable manner, rather than using a hardcoded value). Another way to is allocate the variable within a union that pads it out to a multiple of the page size, and use the gcc attribute __attribute__ ((aligned (4096)) to align the variable.
In place of your MMU IRQ handler, you simply install a signal handler for the SIGSEGV signal using the sa_sigaction member of the structure passed to the sigaction() function. Your signal handler will be passed a siginfo_t structure as its second argument, which will contain a member sa_addr that has the address of the faulting instruction.
Electric fence is sort of old, but still maintained and useful. A number of people have used it as the starting point for more complex debugging. Its extremely easy to modify.
I am also a huge fan of Valgrind, but Valgrind isn't available on all platforms.

Resources