Force memory allocation always to the same virtual address [duplicate] - c

This question already has answers here:
disable the randomness in malloc
(6 answers)
Closed 9 years ago.
I'm experimenting with Pin, an instrumentation tool, which I use to compute some statistics based on memory address of my variables. I want to re-run my program with the information gathered by my instrumentation tool, but for that it's crucial that virtual memory addresses remain the same through different runs.
In general, I should let the OS handle memory allocation, but in this case I need some kind of way to force it to always allocate to the same virtual address. In particular, I'm interested in a very long array, which I'm currently allocating with numa_alloc_onnode(), though I could use something else.
What would be the correct way to proceed?
Thanks

You could try mmap(2).
The instrumented version of your program will use a different memory layout than the original program because pin needs memory for the dynamic translation etc. and will change the memory layout. (if I recall correctly)

With the exception of address space layout randomization, most memory allocators, loaders, and system routines for assigning virtual memory addresses will return the same results given the same calls and data (not by deliberate design for that but by natural consequence of how software works). So, you need to:
Disable address space layout randomization.
Ensure your program executes in the same way each time.
Address space layout randomization is deliberate changes to address space to foil attackers: If the addresses are changed in each program execution, it is more difficult for attacks to use various exploits to control the code that is executed. It should be disabled only temporarily and only for debugging purposes. This answer shows one method of doing that and links to more information, but the exact method may depend on the version of Linux you are using.
Your program may execute differently for a variety of reasons, such as using threads or using asynchronous signals or interprocess communication. It will be up to you to control that in your program.
Generally, memory allocation is not guaranteed to be reproducible. The results you get may be on an as-is basis.

Related

How to know/limit static stack size in C program with GCC/Clang compiler? [duplicate]

This question already has answers here:
How to determine maximum stack usage in embedded system with gcc?
(7 answers)
Closed 1 year ago.
I'm writing an embedded program that uses a static limited stack area of a known size (in other words, I have X bytes for the stack, and there's no overlaying OS that can allocate more stack on demand for me). I want to avoid errors during runtime, and catch them in build time instead - to have some indication if I mistakenly declared too much variables in some function block that won't fit in the stack during the runtime.
Given that I don't use recursive calls in my program, can I somehow know during compilation time how much space on stack all my local variables will take on the deepest function call path? Or at least know how much space my variables will take in a single block (function) if the compiler is not smart enough to analyze it on all the nested calls?
Given that I don't use recursive calls in my program, can I somehow know during compilation time how much space on stack all my local variables will take on the deepest function call path?
Only if you don't use interrupts. Which is extremely likely in any embedded system. So you'll have to find out stack use with dynamic analysis.
The old school way is to set your whole stack area to a value like 0xAA upon reset from a debugger, then let the program run for a while, make sure to provoke all use-cases. Then halt and inspect how far down you still have 0xAA in memory. It isn't a 100% scientific, fool-proof method but works just fine in practice, in the vast majority of cases.
Other methods involve setting write breakpoints at certain stack locations where you don't expect the program to end up, sort of like a "hardware stack canary". Run the program and ensure that the breakpoint never triggers. If it does, then investigate from there, move the breakpoint further down the memory map to see exactly where.
Another good practice is to always memory map your stack so that it can only overflow into forbidden memory or at least into read-only flash etc - ideally you'd get a hardware exception for stack overflow. You definitely want to avoid the stack overflowing into other RAM sections like .data/.bss, as that will cause severe and extremely subtle error scenarios.

What is the main origin of heap and stack memory division?

I read a lot of explanation of heap and stack memory, and all of them obscure anyway in terms of origin. First of all I understand how this memories works with software, but I don't understand the main source of this division. I assume that they are the same unspecialized physical memory, but...
For example say we have PC without any OS, and we want create some bootable program with assembly language for x86. I assume we can do this (Personally I don't know assembly, but some people write OS anyway). So the main question is Can we already operate with heap and stack, or we must create some memory managment machinery for this? If yes, so how it can be possible in terms of bare metal programming?
Adding something to the other answer, fairly correct but perhaps not very complete.
Heap and stack are two (software) ways to "manage" memory. The physical memory, normally, is a flat array of cells where a program can read and write. It is up to the running program to use those cells as it wants. But there is more to say.
1^ thing. Heap is totally software, while stack is also (or mainly) a hardware thing. Most processors have hardware (or CPU instruction) to support the stack, while most (or all?) don't care about the heap. Even more: there are small embedded processors (or microcontrollers) which have a separated stack area - totally different from other ram areas where the program could create a "heap".
2^ thing. Whean speaking about "programs", one can/should think that the operating system (the OS) is a program, specialized in managing resources (memory included), and extendable with "applications" (which are programs). In such scenario, stack and heap are managed in cooperation from both OS and the applications.
So, to reply to your main question, the 90% correct answer is: in bare metal you have already a stack - perhaps you have to issue some short instruction to set it up, but it is straightforward. But you don't have a heap, you must implement it in your program. First you set aside some memory to be used as a stack; and then you can set aside some more memory to be used as a heap, not forgetting that you must preserve some memory for normal/static data. The part of the program that manages the heap should know what to do, using but not erratically overwriting the stack and the static data, to perform its functions.

Need of executable stack and heap memory

As we know that making the stack and the heap area of the virtual memory non-executable can prevent the execution of malicious code (like a shellcode) inside the memory (the technique is called Data Execution Prevention). And, the simplest way to inject the malicious code inside the memory is by overflowing the buffer. Thus, making these areas of the memory non-executable can help in reducing the severity of overflow attacks.
However, there are many other techniques like address space randomization, pointer protection, use of canaries etc. that are used to prevent such attacks. I think most of the system make use of these other methods instead of making the stack/heap memory non-executable.(Please correct me if I am wrong here)
Now, my question is, are there some specific operations or special cases in which the stack/heap parts of memory are required to be executable?
JITs map writeable and executable regions of memory or simply mprotect previously allocated memory to make it executable.
GCC used to require an a system dependent method to mark parts of the stack executable for their trampoline code. This was 12 years ago though, I don't know how it's done today.
Dynamic linking on many systems also needs an ability to write to a jump table for function calls resolved during run time. If you want to have the jump table non-writeable between updates to the table that can be quite costly.
Generally it's possible to solve those problems safely by trying to enforce a policy where memory is writeable or executable, but never both. Memory can be remapped to be writeable when the write needs to be done and then protected again to make it executable. It trades off some performance (not that much) for better security and slightly more complex code.

what is Address space layout randomization [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Memory randomization as application security enhancement?
hi,
Can some explain me please what address space Layout Randomization is and how is it implemented. How does this technique affect the stack, heap and static data. Also I am interested in any papers that explain about the address space Layout Randomization.
Thanks & Regards,
Mousey.
ASLR is a technique designed to make various types of buffer overruns more difficult to exploit, by moving segments around a bit. The stack could be shifted a few bytes (or pages), the sections of your program (and even the libraries your code uses) can be loaded at different addresses, etc.
Buffer overflows usually work by tricking the CPU into running code at a certain address (often on the stack). ASLR complicates that by making the address harder to predict, since it can change each and every time the program runs. So often, instead of running arbitrary code, the program will just crash. This is obviously a bad thing, but not as bad as if some random joker were allowed to take control of your server.
A very simple, crude form of ASLR can actually be implemented without any help from the OS, by simply subtracting some small amount from the stack pointer. (It's a little tricky to do in higher-level languages, but somewhat simpler in C -- and downright trivial in ASM.) That'll only protect against overflows that use the stack, though. The OS is more helpful; it can change all sorts of stuff if it feels like. It depends on your OS as to how much it does, though.

Linux C debugging library to detect memory corruptions [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
When working sometimes ago on an embedded system with a simple MMU, I used to program dynamically this MMU to detect memory corruptions.
For instance, at some moment at runtime, the foo variable was overwritten with some unexpected data (probably by a dangling pointer or whatever). So I added the additional debugging code :
at init, the memory used by foo was indicated as a forbidden region to the MMU;
each time foo was accessed on purpose, access to the region was allowed just before then forbidden just after;
a MMU irq handler was added to dump the master and the address responsible of the violation.
This was actually some kind of watchpoint, but directly self-handled by the code itself.
Now, I would like to reuse the same trick, but on a x86 platform. The problem is that I am very far from understanding how is working the MMU on this platform, and how it is used by Linux, but I wonder if any library/tool/system call already exist to deal with this problem.
Note that I am aware that various tools exist like Valgrind or GDB to manage memory problems, but as far as I know, none of these tools car be dynamically reconfigured by the debugged code.
I am mainly interested for user space under Linux, but any info on kernel mode or under Windows is also welcome!
You can use the mmap (MAP_ANONYMOUS) and mprotect functions to manipulate the virtual memory system and use the corresponding protection flags. Your variables need to be constrained to a multiple of the system page size of course. Lots of small variables will present a significant overhead.
Of course your application needs to work correctly when managing access rights to the memory regions. You also need to use mmap() instead of malloc for the protected regions.
This is the user space interface layer to the MMU, in a relatively portable fashion.
mmap and mprotect
Two good options:
dmalloc is a library that replaces malloc() and free() with extensive debugging versions, capable of using page boundaries to detect memory overruns/underruns, filling allocated and freed memory, leak-checking, and more.
valgrind is a memory debugger that allows very precise memory debugging (detecting accurately any out-of-bounds access) at the expense of program speed (programs run substantially slower under it). It can also do leak checking.
I think the best you're going to be able to do is to fire off a watchdog thread that keeps a copy of the value and continually compares its copy to the working value. You won't be able to catch exactly when the value is overwritten, but you'll be notified to within whatever granularity you want (i.e., if you set the thread to check every 10ms you'll be notified within 10ms).
The mprotect() system call is what you're after. This lets you change the protections on a memory region.
Memory protection on x86 under Linux is done at the level of a page - 4096 bytes. So you will have to arrange for your protected variable to live on its own page(s), not sharing with any other variables. One way to arrange for this is to use posix_memalign() to allocate the memory for the variable, using 4096 as the alignment and rounding up the size to the next multiple of 4096 (Actually, you can use sysconf(_SC_PAGESIZE) to determine the page size in a portable manner, rather than using a hardcoded value). Another way to is allocate the variable within a union that pads it out to a multiple of the page size, and use the gcc attribute __attribute__ ((aligned (4096)) to align the variable.
In place of your MMU IRQ handler, you simply install a signal handler for the SIGSEGV signal using the sa_sigaction member of the structure passed to the sigaction() function. Your signal handler will be passed a siginfo_t structure as its second argument, which will contain a member sa_addr that has the address of the faulting instruction.
Electric fence is sort of old, but still maintained and useful. A number of people have used it as the starting point for more complex debugging. Its extremely easy to modify.
I am also a huge fan of Valgrind, but Valgrind isn't available on all platforms.

Resources