Why code in stack or heap segment can be executed? - c

In the security field, there are heap exploitation and stack smashing attack.
But I found that /proc/*/maps file, the heap and stack segment,
only have rw-p-permission.
There is no execution permission in the these two segments.
My engineer friends told me that if you have rw permission in the Intel CPU, your code will got the execution permission automatically.
But I can not understand why Intel do this design?

That is because all segments in Linux (Windows also) have the same base address and the same size. Code is always accessed via code segment and code segment covers exactly the same area as stack (or any other) segment, so you can execute code wherever it is.
EDIT:
you can read more here: http://www.intel.com/Assets/en_US/PDF/manual/253668.pdf
Chapter 3.2 USING SEGMENTS

Related

What is the main origin of heap and stack memory division?

I read a lot of explanation of heap and stack memory, and all of them obscure anyway in terms of origin. First of all I understand how this memories works with software, but I don't understand the main source of this division. I assume that they are the same unspecialized physical memory, but...
For example say we have PC without any OS, and we want create some bootable program with assembly language for x86. I assume we can do this (Personally I don't know assembly, but some people write OS anyway). So the main question is Can we already operate with heap and stack, or we must create some memory managment machinery for this? If yes, so how it can be possible in terms of bare metal programming?
Adding something to the other answer, fairly correct but perhaps not very complete.
Heap and stack are two (software) ways to "manage" memory. The physical memory, normally, is a flat array of cells where a program can read and write. It is up to the running program to use those cells as it wants. But there is more to say.
1^ thing. Heap is totally software, while stack is also (or mainly) a hardware thing. Most processors have hardware (or CPU instruction) to support the stack, while most (or all?) don't care about the heap. Even more: there are small embedded processors (or microcontrollers) which have a separated stack area - totally different from other ram areas where the program could create a "heap".
2^ thing. Whean speaking about "programs", one can/should think that the operating system (the OS) is a program, specialized in managing resources (memory included), and extendable with "applications" (which are programs). In such scenario, stack and heap are managed in cooperation from both OS and the applications.
So, to reply to your main question, the 90% correct answer is: in bare metal you have already a stack - perhaps you have to issue some short instruction to set it up, but it is straightforward. But you don't have a heap, you must implement it in your program. First you set aside some memory to be used as a stack; and then you can set aside some more memory to be used as a heap, not forgetting that you must preserve some memory for normal/static data. The part of the program that manages the heap should know what to do, using but not erratically overwriting the stack and the static data, to perform its functions.

Why malloc() returns null, and how much needed or left for the heap on a STM32F407?

I just found out that my decoder library fails to initialize as malloc() fails to allocate memory and returns to the caller with "NULL".
I tried many possible scenarios, with or without casting and referred to a lot of other threads about malloc(), but nothing has worked, until I changed the heap size to 0x00001400, which has apparently solved the problem.
Now, the question is, how can I tell how much heap needed, or left for the program? The datasheet says my MCU has: "Up to 192+4 Kbytes of SRAM including 64-Kbyte of CCM (core coupled memory) data RAM" Could someone explain to me what that means? Changing that to 0x00002000 (8192 bytes) would lead to dozens of the following error:
Error: L6406E: No space in execution regions with .ANY selector
Isn't 8KB of RAM is fraction of fraction of what the device has? Why I can't add more to the heap beyond the 0x00001800?
The program size reported by Keil after compilation is:
Program Size: Code=103648 RO-data=45832 RW-data=580 ZI-data=129340
The error Error: L6406E, is because no enough RAM on your target to support in linker file, there is no magic way to get more RAM, both stack and heap are using RAM memory, But in you case it seems to have more than enough memory but compiler is not aware of same.
My suggestion is to use linker response files with the Keil µVision IDE and update required memory section according to the use..
The linker command (or response) file contains only linker directives. The .OBJ files and .LIB files that are to be linked are not listed in the command file. These are obtained by µVision automatically from your project file.
The best way to start using a linker command file is to have µVision create one for you automatically and then begin making the necessary changes.
To generate a Command File from µVision...
Go to the Project menu and select the Options for Target item.
Click on the L166 Misc or L51 Misc tab to open the miscellaneous linker options.
Check the use linker control file checkbox.
Click on the Create... button. This creates a linker control file.
Click on the Edit... button. This opens the linker control file for editing.
Edit the command file to include the directives you need.
When you create a linker command file, the file created includes the directives you currently have selected.
Regarding malloc() issue you are facing,
The sizes of heap required is based on how much memory required in a application, especially the memory required dynamic memory allocation using malloc and calloc.
please note that some of the C library like "printf" functions are also using dynamic memory allocation under the hood.
If you are using the keil IDE for compiling your source code then you can increase the heap size by modifying the startup file.
;******************************************************************************
;
; <o> Heap Size (in Bytes) <0x0-0xFFFFFFFF:8>
;
;******************************************************************************
Heap EQU 0x00000000
;******************************************************************************
;
; Allocate space for the heap.
;
;******************************************************************************
AREA HEAP, NOINIT, READWRITE, ALIGN=3
__heap_base
HeapMem
SPACE Heap
__heap_limit
;******************************************************************************
if you are using the make enveromennt to build the applicatation then simpely change the HEAP sizse in liner file.
Details regarding same you can get directly from Keil official website, Please check following links,
https://www.keil.com/pack/doc/mw/General/html/mw_using_stack_and_heap.html
http://www.keil.com/forum/11132/heap-size-bss-section/
http://www.keil.com/forum/14201/
BR
Jerry James.
Now, the question is, how can I tell how much heap needed, or left for the program?
That is two separate questions.
The amount of heap needed is generally non-deterministic (one reason for avoiding dynamic memory allocation in most cases in embedded systems with very limited memory) - it depends entirely on the behaviour of your program, and if your program has a memory leak bug, even knowledge of the intended behaviour won't help you.
However, any memory not allocated statically by your application can generally be allocated to the heap, otherwise it will remain unused by the C runtime in any case. In other toolchains, it is common for the linker script to automatically allocate all unused memory to the heap, so that it is as large as possible, but the default script and start-up code generated by Keil's ARM MDK does not do that; and if you make it as large as possible, then modify the code you may have to adjust the allocation each time - so it is easiest during development at least to leave a small margin for additional static data.
The datasheet says my MCU has: "Up to 192+4 Kbytes of SRAM including 64-Kbyte of CCM (core coupled memory) data RAM" Could someone explain to me what that means?
Another problem is that the ARM MDK C library's malloc() implementation requires a contiguous heap and does not support the addition of arbitrary memory blocks (as far as I have been able to determine in any case), so the 64Kb CCM block cannot be used as heap memory unless the entire heap is allocated there. The memory is in fact segmented as follows:
SRAM1 112 kb
SRAM2 16 kb
CCM 64 kb
BKUPSRAM 4 kb
SRAM 1/2 are contiguous but on separate buses (which can be exploited to support DMA operations without introducing wait-states for example).
The CCM mmeory cannot be used for DMA or bit-banding, and the default ARM-MDK generated linker script does not map it at all, so to utilise it you must use a custom linker script, and then ensure that any DMA or bit-banded data are explicitly located in one of the other regions. If your heap need not be more than 64kb you could locate it there but to do that needs a modification of the start-up assembler code that allocates the heap.
The 4Kb backup SRAM is accessed as a peripheral and is mapped in the peripheral register space.
With respect to determining how much heap remains at run-time, the ARM library provides a somewhat cumbersome __heapstats function. Unfortunately it does not simply return the available freespace (it is not quite as simple as that because heap free space is not on its own particularly useful since block fragmentation can least to allocation failure even if cumulatively there is sufficient memory). __heapstats requires a pointer to an fprintf()-like function to output formatted text information on heap state. For example:
void heapinfo()
{
typedef int (*__heapprt)(void *, char const *, ...);
__heapstats( (__heapprt)fprintf, stdout ) ;
}
Then you might write:
mem = malloc( some_space ) ;
if( mem == NULL )
{
heapinfo() ;
for(;;) ; // wait for watchdog or debugger attach
}
// memory allocated successfully
Given:
Program Size: Code=103648 RO-data=45832 RW-data=580 ZI-data=129340
You have used 129920 of the available 131652 bytes, so could in theory add 1152 bytes to the heap, but you would have to keep changing this as the ammount of static data changed as you modified your code. Part of the ZI (zero initialised) data is your heap allocation, everything else is your application stack and static data with no explicit initialiser. The full link map generated by the linker will show what is allocated statically.
It may be possible to increase heap size by reducing stack allocation. The ARM linker can generate stack usage analysis in the link map (as described here) to help "right-size" your stack. If you have excessive stack allocation, this may help. However stack-overflow errors are even more difficult to detect and debug than memory allocation failure and call by function-pointer and interrupt processing will confound such analysis, so leave a safety margin.
It would perhaps be better to use a customised linker script and modify the heap location in the start-up code to locate the heap in the otherwise unused CCM segment (and be sure you do not use dynamic memory for either DMA or bit-banding). You can then safely create a 64Kb heap assuming you locate nothing else there.

Where memory segments are defined?

I just learned about different memory segments like: Text, Data, Stack and Heap. My question is:
1- Where the boundaries between these sections are defined? Is it in Compiler or OS?
2- How the compiler or OS know which addresses belong to each section? Should we define it anywhere?
This answer is from the point of view of a more special-purpose embedded system rather than a more general-purpose computing platform running an OS such as Linux.
Where the boundaries between these sections are defined? Is it in Compiler or OS?
Neither the compiler nor the OS do this. It's the linker that determines where the memory sections are located. The compiler generates object files from the source code. The linker uses the linker script file to locate the object files in memory. The linker script (or linker directive) file is a file that is a part of the project and identifies the type, size and address of the various memory types such as ROM and RAM. The linker program uses the information from the linker script file to know where each memory starts. Then the linker locates each type of memory from an object file into an appropriate memory section. For example, code goes in the .text section which is usually located in ROM. Variables go in the .data or .bss section which are located in RAM. The stack and heap also go in RAM. As the linker fills one section it learns the size of that section and can then know where to start the next section. For example, the .bss section may start where the .data section ended.
The size of the stack and heap may be specified in the linker script file or as project options in the IDE.
IDEs for embedded systems typically provide a generic linker script file automatically when you create a project. The generic linker file is suitable for many projects so you may never have to customize it. But as you customize your target hardware and application further you may find that you also need to customize the linker script file. For example, if you add an external ROM or RAM to the board then you'll need to add information about that memory to the linker script so that the linker knows how to locate stuff there.
The linker can generate a map file which describes how each section was located in memory. The map file may not be generated by default and you may need to turn on a build option if you want to review it.
How the compiler or OS know which addresses belong to each section?
Well I don't believe the compiler or OS actually know this information, at least not in the sense that you could query them for the information. The compiler has finished its job before the memory sections are located by the linker so the compiler doesn't know the information. The OS, well how do I explain this? An embedded application may not even use an OS. The OS is just some code that provides services for an application. The OS doesn't know and doesn't care where the boundaries of memory sections are. All that information is already baked into the executable code by the time the OS is running.
Should we define it anywhere?
Look at the linker script (or linker directive) file and read the linker manual. The linker script is input to the linker and provides the rough outlines of memory. The linker locates everything in memory and determines the extent of each section.
For your Query :-
Where the boundaries between these sections are defined? Is it in Compiler or OS?
Answer is OS.
There is no universally common addressing scheme for the layout of the .text segment (executable code), .data segment (variables) and other program segments. However, the layout of the program itself is well-formed according to the system (OS) that will execute the program.
How the compiler or OS know which addresses belong to each section? Should we define it anywhere?
I divided your this question into 3 questions :-
About the text (code) and data sections and their limitation?
Text and Data are prepared by the compiler. The requirement for the compiler is to make sure that they are accessible and pack them in the lower portion of address space. The accessible address space will be limited by the hardware, e.g. if the instruction pointer register is 32-bit, then text address space would be 4 GiB.
About Heap Section and limit? Is it the total available RAM memory?
After text and data, the area above that is the heap. With virtual memory, the heap can practically grow up close to the max address space.
Do the stack and the heap have a static size limit?
The final segment in the process address space is the stack. The stack takes the end segment of the address space and it starts from the end and grows down.
Because the heap grows up and the stack grows down, they basically limit each other. Also, because both type of segments are writeable, it wasn't always a violation for one of them to cross the boundary, so you could have buffer or stack overflow. Now there are mechanism to stop them from happening.
There is a set limit for heap (stack) for each process to start with. This limit can be changed at runtime (using brk()/sbrk()). Basically what happens is when the process needs more heap space and it has run out of allocated space, the standard library will issue the call to the OS. The OS will allocate a page, which usually will be manage by user library for the program to use. I.e. if the program wants 1 KiB, the OS will give additional 4 KiB and the library will give 1 KiB to the program and have 3 KiB left for use when the program ask for more next time.
Most of the time the layout will be Text, Data, Heap (grows up), unallocated space and finally Stack (grows down). They all share the same address space.
The sections are defined by a format which is loosely tied to the OS. For example on Linux you have ELF and on Mac OS you have Mach-O.
You do not define the sections explicitly as a programmer, in 99.9% of cases. The compiler knows what to put where.

memory starting location in C [duplicate]

This question already has an answer here:
Why do virtual memory addresses for linux binaries start at 0x8048000?
(1 answer)
Closed 8 years ago.
I am looking into to the memory layout of a given process. I notice that the starting memory location of each process is not 0. On this website, TEXT starts at 0x08048000. One reason can be to distinguish the address with the NULL pointer. I am just wondering if there is any another good reasons? Thanks.
The null pointer doesn't actually have to be 0. It's guaranteed in the C standard that when a 0 value is given in the context of a pointer it's treated as NULL by the compiler.
But the 0 that you use in your source code is just syntactic sugar that has no relation to the actual physical address the null-pointer value is "pointing" to.
For further details see:
Why is NULL/0 an illegal memory location for an object?
Why is address zero used for the null pointer?
An application on your operating system has its unique address space, which it sees as a continuous block of memory (the memory isn't physically continuous, it's just "the impression" the operating system gives to every program).
For the most part, each process's virtual memory space is laid out in a similar and predictable manner (this is the memory layout in a Linux process, 32-bit mode):
(image from Anatomy of a Program in Memory)
Look at the text segment (the default .text base on x86 is 0x08048000, chosen by the default linker script for static binding).
Why the magical 0x08048000? Likely because Linux borrowed that address from the System V i386 ABI.
... and why then did System V use 0x08048000?
The value was chosen to accommodate the stack below the .text section,
growing downward. The 0x48000 bytes could be mapped by the same page
table already required by the .text section (thus saving a page table
in most cases), while the remaining 0x08000000 would allow more room
for stack-hungry applications.
Is there anything below 0x08048000? There could be nothing (it's only 128M), but you can pretty much map anything you desire there, using the mmap() system call.
See also:
What's the memory before 0x08048000 used for in 32 bit machine?
Reorganizing the address space
mmap
I think this sums it up:
Each process has its own set of page tables, but there is a catch. Once virtual addresses are enabled, they apply to all software running in the machine, including the kernel itself. Thus a portion of the virtual address space must be reserved to the kernel.
So while the process gets it's own address space. Without allocating a block to the kernel, it would not be able to address kernel code and data.
This is always the first block of memory it appears and so includes address 0. The user mode space starts beyond this, and so that is where both the stack and heap reside.
Distinguishing from NULL pointer
Even if the user mode space started at address 0, there would not be any data allocated to the address 0 as that will be in the stack or the heap which themselves do not start at the beginning of the user area. Therefore NULL (with the value of 0) could be used still and is not a reason for this layout.
However one benefit related to the NULL and the first block being kernel memory is any attempt to read/write to NULL throws a Segmentation Fault.
A loader loads a binary in segments into memory: text (constants), data, code. There is no need to start from 0, and as C is has the problem from bugs accessing around null, like in a[i] that is even dangerous. This allows (on some processors) to intercept segmentation faults.
It would be the C runtime introducing a linear address space from 0. That might be imaginable where C is the operating system's implementation language. But serves no purpose; to have the heap start from 0. The memory model is one of segments. A code segment might be protected against modification by some processors.
And in segments allocation happens in C runtime managed memory blocks.
I might add, that physical 0 and upwards is often used by the operating system itself.

Determine Stack bottom, start and end of data segment of C program

I am trying to understand how memory space is allocated for a C program. For that , I want to determine stack and data segment boundaries. Is there any library call or system call which does this job ? I found that stack bottom can be determined by reading /proc/self/stat. However, I could not find how to do it. Please help. :)
Processes don't have a single "data segment" anymore. They have a bunch of mappings of memory into their address space. Common cases are:
Shared library or executable code or rodata, mapped shared, without write access.
Glibc heap segments, anonymous segments mapped with rw permissions.
Thread stack areas. They look a lot like heap segments, but are usually separated from each other with some unmapped guard pages.
As Nikolai points out, you can look at the list of these with the pmap tool.
Look into /proc/<pid>/maps and /proc/<pid>/smaps (assuming Linux). Also pmap <pid>.
There is no general method for doing this. In fact, some of the secure computing environments randomize the exact address space allocations and order so that code injection attacks are more challenging to engineer.
However, every C runtime library has to arrange the contributions of data and stack segments so the program works correctly. Reading the runtime startup code is the most direct way of finding the answer.
Which C compiler are you interested in?

Resources