Need of executable stack and heap memory - c

As we know that making the stack and the heap area of the virtual memory non-executable can prevent the execution of malicious code (like a shellcode) inside the memory (the technique is called Data Execution Prevention). And, the simplest way to inject the malicious code inside the memory is by overflowing the buffer. Thus, making these areas of the memory non-executable can help in reducing the severity of overflow attacks.
However, there are many other techniques like address space randomization, pointer protection, use of canaries etc. that are used to prevent such attacks. I think most of the system make use of these other methods instead of making the stack/heap memory non-executable.(Please correct me if I am wrong here)
Now, my question is, are there some specific operations or special cases in which the stack/heap parts of memory are required to be executable?

JITs map writeable and executable regions of memory or simply mprotect previously allocated memory to make it executable.
GCC used to require an a system dependent method to mark parts of the stack executable for their trampoline code. This was 12 years ago though, I don't know how it's done today.
Dynamic linking on many systems also needs an ability to write to a jump table for function calls resolved during run time. If you want to have the jump table non-writeable between updates to the table that can be quite costly.
Generally it's possible to solve those problems safely by trying to enforce a policy where memory is writeable or executable, but never both. Memory can be remapped to be writeable when the write needs to be done and then protected again to make it executable. It trades off some performance (not that much) for better security and slightly more complex code.

Related

What is the main origin of heap and stack memory division?

I read a lot of explanation of heap and stack memory, and all of them obscure anyway in terms of origin. First of all I understand how this memories works with software, but I don't understand the main source of this division. I assume that they are the same unspecialized physical memory, but...
For example say we have PC without any OS, and we want create some bootable program with assembly language for x86. I assume we can do this (Personally I don't know assembly, but some people write OS anyway). So the main question is Can we already operate with heap and stack, or we must create some memory managment machinery for this? If yes, so how it can be possible in terms of bare metal programming?
Adding something to the other answer, fairly correct but perhaps not very complete.
Heap and stack are two (software) ways to "manage" memory. The physical memory, normally, is a flat array of cells where a program can read and write. It is up to the running program to use those cells as it wants. But there is more to say.
1^ thing. Heap is totally software, while stack is also (or mainly) a hardware thing. Most processors have hardware (or CPU instruction) to support the stack, while most (or all?) don't care about the heap. Even more: there are small embedded processors (or microcontrollers) which have a separated stack area - totally different from other ram areas where the program could create a "heap".
2^ thing. Whean speaking about "programs", one can/should think that the operating system (the OS) is a program, specialized in managing resources (memory included), and extendable with "applications" (which are programs). In such scenario, stack and heap are managed in cooperation from both OS and the applications.
So, to reply to your main question, the 90% correct answer is: in bare metal you have already a stack - perhaps you have to issue some short instruction to set it up, but it is straightforward. But you don't have a heap, you must implement it in your program. First you set aside some memory to be used as a stack; and then you can set aside some more memory to be used as a heap, not forgetting that you must preserve some memory for normal/static data. The part of the program that manages the heap should know what to do, using but not erratically overwriting the stack and the static data, to perform its functions.

Way to detect that stack area is not overlapping RAM area during runtime

Is there any way to check or prevent stack area from crossing the RAM data (.data or .bss) area in the limited memory (RAM/ROM) embedded systems comprising microcontrollers? There are tools to do that, but they come with very costly license fees like C-STAT and C-RUN in IAR.
You need no external tools to view and re-map your memory layout. The compiler/linker you are using should provide means of doing so. How to do this is of course very system-specific.
What you do is to open up the system-specific linker file in which all memory segments have been pre-defined to a default for the given microcontroller. You should have the various RAM segments listed there, de facto standard names are: .stack .data .bss and .heap.
Each such segment will have an address range specified. Change the addresses and you will move the segments. However, these linker files usually have some obscure syntax that you need to study before you touch anything. If you are (un)lucky it uses GNU linker scripts, which is a well-documented, though rather complex standard.
There could also be some manufacturer-supplied start-up code that sets the stack pointer. You might have to modify that code manually, in addition to tweaking the linker file.
Regarding the stack: you need to check the CPU core manual and see if the stack pointer moves upwards or downwards on your given system. Most common is downwards, but the alternative exists. You should ensure that in the direction that the stack grows, there is no other read/write data segment which it can overwrite upon stack overflow. Ideally the stack should overflow into non-mapped memory where access would cause a CPU hardware interrupt/exception.
Here is an article describing how to do this.
In small micros that do not have the necessary hardware support for this, a very simple method is to have a periodic task (either under a multitasker or via a regular timed interrupt) check the 'threshold' RAM address which you must have initialized to some 'magic' pattern, like 0xAA55
Once the periodic task sees this memory address change contents, you have a problem!
In microcontrollers with limited resources, it is always a good idea to prevent stack overflow via simple memory usage optimizations:
Reduce overall RAM usage by storing read-only variables in non-volatile (e.g. flash) memory. A good target for this are constant strings in your code, like the ones used on printf() format strings, for example. This can free a lot of memory for your stack to grow. Check you compiler documentation about how to allocate these variables in flash.
Avoid recursive calls - they are not a good idea in resource-constrained or safety-critical systems, as you have little control over how the stack grows.
Avoid passing large parameters by value in function calls - pass them as const references whenever possible (e.g. for structs or classes).
Minimize unnecessary usage of local variables. Look particularly for the large ones, like local buffers for example. Often you can find ways to just remove them, or to use a shared resource instead without compromising your code.

Force memory allocation always to the same virtual address [duplicate]

This question already has answers here:
disable the randomness in malloc
(6 answers)
Closed 9 years ago.
I'm experimenting with Pin, an instrumentation tool, which I use to compute some statistics based on memory address of my variables. I want to re-run my program with the information gathered by my instrumentation tool, but for that it's crucial that virtual memory addresses remain the same through different runs.
In general, I should let the OS handle memory allocation, but in this case I need some kind of way to force it to always allocate to the same virtual address. In particular, I'm interested in a very long array, which I'm currently allocating with numa_alloc_onnode(), though I could use something else.
What would be the correct way to proceed?
Thanks
You could try mmap(2).
The instrumented version of your program will use a different memory layout than the original program because pin needs memory for the dynamic translation etc. and will change the memory layout. (if I recall correctly)
With the exception of address space layout randomization, most memory allocators, loaders, and system routines for assigning virtual memory addresses will return the same results given the same calls and data (not by deliberate design for that but by natural consequence of how software works). So, you need to:
Disable address space layout randomization.
Ensure your program executes in the same way each time.
Address space layout randomization is deliberate changes to address space to foil attackers: If the addresses are changed in each program execution, it is more difficult for attacks to use various exploits to control the code that is executed. It should be disabled only temporarily and only for debugging purposes. This answer shows one method of doing that and links to more information, but the exact method may depend on the version of Linux you are using.
Your program may execute differently for a variety of reasons, such as using threads or using asynchronous signals or interprocess communication. It will be up to you to control that in your program.
Generally, memory allocation is not guaranteed to be reproducible. The results you get may be on an as-is basis.

Do we really have to free() when we malloc()? What makes it different from an automatic variable then?

The OS will just recover it (after the program exits) right? So what's the use other than good programming style? Or is there something I'm misunderstanding? What makes it different from "automatic" allocation since both can be changed during run time, and both end after program execution?
When your application is working with vast amounts of data, you must free in order to conserve heap space. If you don't, several bad things can happen:
the OS will stop allocating memory for you (crashing)
the OS will start swapping your data to disk (thrashing)
other applications will have less space to put their data
The fact that the OS collects all the space you allocate when the application exits does not mean you should rely upon this to write a solid application. This would be like trying to rely on the compiler to optimize poor programming. Memory management is crucial for good performance, scalability, and reliability.
As others have mentioned, malloc allocates space in the heap, while auto variables are created on the stack. There are uses for both, but they are indeed very different. Heap space must be allocated and managed by the OS and can store data dynamically and of different sizes.
If you call a macro for thousand times without using free() then compiler or safe to say system will assign you thousand different address, but if you use free() after each malloc then only one memory address will be given to you every time.
So chances of memory leak, bus error, memory out of bound and crash would be minimum.
Its safe to use free().
In C/C++ "auto" variables are allocated on the stack. They are destroyed right at the exit from the function. This will happen automatically. You do not need to write anything for this.
Heap allocations (result of a call to malloc) are either released explicitly (with a call to free) or they are cleaned up when the process ends.
If you are writing small program that will be used maybe once or twice, then it is ok not to free your heap allocations. This is not nice but acceptable.
If you are writing medium or big project or are planning to include your code into other project, you should definitely release every heap allocation. Not doing this will create HUGE trouble. The heap memory is not endless. Program may use it all. Even if you will allocate small amount of memory, this will still create unnedded pressure on the OS, cause swapping, etc.
The bottom line: freeing allocations is much more than just a style or a good habit.
An automatic variable is destroyed (and its memory is re-usable) as soon as you exit the scope in which it is defined. For most variables that's much earlier than program exit.
If you malloc and don't free, then the memory isn't re-usable until the program exits. Not even then, on some systems with very minimal OS.
So yes, there's big difference between an automatic variable and a leaked memory allocation. Call a function that leaks an allocation enough times, and you'll run out of memory. Call a function with an automatic variable in it as many times as you like, the memory is re-usable.
It is good programming style and it's more than that. Not doing proper memory management in non-trivial programs will eventually influence the usability of your program. Sure the OS can reclaim any resources that you've allocated/used after your program terminates, but that doesn't alleviate the burden or potential issues during program execution.
Consider the web browser that you've used to post this question: if the browser is written in a language that requires memory management, and the code didn't do it properly, how long do you think it would be before you'd notice that it's eating up all your memory? How long do you think the browser would remain usable? Now consider that users often leave browsers open for long periods of time: without proper memory management, they would become unusable after few page loads.
If your program does not exit immediately and you're not freeing your memory you're going to end up wasting it. Either you'll run out of memory eventually, or you'll start swapping to disk (which is slow, and also not unlimited).
automatic variable is on the stack and its size should be known on compilation time. if you need to store data that you don't the size, for example, maintain a binary tree, where the user add and removes objects. beside that stack size might be limited (depends on your target), for example, linux kernel the stack is 4k-8k usually. you also trash the instruction cache, which affects performance,
Yes you absolutely have to use free() after malloc() (as well as closing files and other resources when you're done). While it's true that the OS will recover it after execution, a long running process will leak memory that way. If your program is as simple as a main method that runs a single method then exists, it's probably not a big deal, albeit incredibly sloppy. You should get in the habit of managing memory properly in C because one day you may want to write a nontrivial program that runs for more than a second, and if you don't learn how to do it in advance, you'll have a huge headache dealing with memory leaks.

Determine total memory usage of embedded C program

I would like to be able to debug how much total memory is being used by C program in a limited resource environment of 256 KB memory (currently I am testing in an emulator program).
I have the ability to print debug statements to a screen, but what method should I use to calculate how much my C program is using (including globals, local variables [from perspective of my main function loop], the program code itself etc..)?
A secondary aspect would be to display the location/ranges of specific variables as opposed to just their size.
-Edit- The CPU is Hitachi SH2, I don't have an IDE that lets me put breakpoints into the program.
Using the IDE options make the proper actions (mark a checkobx, probably) so that the build process (namely, the linker) will generate a map file.
A map file of an embedded system will normally give you the information you need in a detailed fashion: The memory segments, their sizes, how much memory is utilzed in each one, program memory, data memory, etc.. There is usually a lot of data supplied by the map file, and you might need to write a script to calculate exactly what you need, or copy it to Excel. The map file might also contain summary information for you.
The stack is a bit trickier. If the map file gives that, then there you have it. If not, you need to find it yourself. Embedded compilers usually let you define the stack location and size. Put a breakpoint in the start of you program. When the application stops there zero the entire stack. Resume the application and let it work for a while. Finally stop it and inspect the stack memory. You will see non-zero values instead of zeros. The used stack goes until the zeros part starts again.
Generally you will have different sections in mmap generated file, where data goes, like :
.intvect
.intvect_end
.rozdata
.robase
.rosdata
.rodata
.text .... and so on!!!
with other attributes like Base,Size(hex),Size(dec) etc for each section.
While at any time local variables may take up more or less space (as they go in and out of scope), they are instantiated on the stack. In a single threaded environment, the stack will be a fixed allocation known at link time. The same is true of all statically allocated data. The only run-time variable part id dynamically allocated data, but even then sich data is allocated from the heap, which in most bare-metal, single-threaded environments is a fixed link-time allocation.
Consequently all the information you need about memory allocation is probably already provided by your linker. Often (depending on your tool-chain and linker parameters used) basic information is output when the linker runs. You can usually request that a full linker map file is generated and this will give you detailed information. Some linkers can perform stack usage analysis that will give you worst case stack usage for any particular function. In a single threaded environment, the stack usage from main() will give worst case overall usage (although interrupt handlers need consideration, the linker is not thread or interrupt aware, and some architectures have separate interrupt stacks, some are shared).
Although the heap itself is typically a fixed allocation (often all the available memory after the linker has performed static allocation of stack and static data), if you are using dynamic memory allocation, it may be useful at run-time to know how much memory has been allocated from the heap, as well as information about the number of allocations, average size of allocation, and the number of free blocks and their sizes also. Because dynamic memory allocation is implemented by your system's standard library any such analysis facility will be specific to your library, and may not be provided at all. If you have the library source you could implement such facilities yourself.
In a multi-threaded environment, thread stacks may be allocated statically or from the heap, but either way the same analysis methods described above apply. For stack usage analysis, the worst-case for each thread is measured from the entry point of each thread rather than from main().

Resources