what is Address space layout randomization [duplicate] - c

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Memory randomization as application security enhancement?
hi,
Can some explain me please what address space Layout Randomization is and how is it implemented. How does this technique affect the stack, heap and static data. Also I am interested in any papers that explain about the address space Layout Randomization.
Thanks & Regards,
Mousey.

ASLR is a technique designed to make various types of buffer overruns more difficult to exploit, by moving segments around a bit. The stack could be shifted a few bytes (or pages), the sections of your program (and even the libraries your code uses) can be loaded at different addresses, etc.
Buffer overflows usually work by tricking the CPU into running code at a certain address (often on the stack). ASLR complicates that by making the address harder to predict, since it can change each and every time the program runs. So often, instead of running arbitrary code, the program will just crash. This is obviously a bad thing, but not as bad as if some random joker were allowed to take control of your server.
A very simple, crude form of ASLR can actually be implemented without any help from the OS, by simply subtracting some small amount from the stack pointer. (It's a little tricky to do in higher-level languages, but somewhat simpler in C -- and downright trivial in ASM.) That'll only protect against overflows that use the stack, though. The OS is more helpful; it can change all sorts of stuff if it feels like. It depends on your OS as to how much it does, though.

Related

How to know/limit static stack size in C program with GCC/Clang compiler? [duplicate]

This question already has answers here:
How to determine maximum stack usage in embedded system with gcc?
(7 answers)
Closed 1 year ago.
I'm writing an embedded program that uses a static limited stack area of a known size (in other words, I have X bytes for the stack, and there's no overlaying OS that can allocate more stack on demand for me). I want to avoid errors during runtime, and catch them in build time instead - to have some indication if I mistakenly declared too much variables in some function block that won't fit in the stack during the runtime.
Given that I don't use recursive calls in my program, can I somehow know during compilation time how much space on stack all my local variables will take on the deepest function call path? Or at least know how much space my variables will take in a single block (function) if the compiler is not smart enough to analyze it on all the nested calls?
Given that I don't use recursive calls in my program, can I somehow know during compilation time how much space on stack all my local variables will take on the deepest function call path?
Only if you don't use interrupts. Which is extremely likely in any embedded system. So you'll have to find out stack use with dynamic analysis.
The old school way is to set your whole stack area to a value like 0xAA upon reset from a debugger, then let the program run for a while, make sure to provoke all use-cases. Then halt and inspect how far down you still have 0xAA in memory. It isn't a 100% scientific, fool-proof method but works just fine in practice, in the vast majority of cases.
Other methods involve setting write breakpoints at certain stack locations where you don't expect the program to end up, sort of like a "hardware stack canary". Run the program and ensure that the breakpoint never triggers. If it does, then investigate from there, move the breakpoint further down the memory map to see exactly where.
Another good practice is to always memory map your stack so that it can only overflow into forbidden memory or at least into read-only flash etc - ideally you'd get a hardware exception for stack overflow. You definitely want to avoid the stack overflowing into other RAM sections like .data/.bss, as that will cause severe and extremely subtle error scenarios.

What is the main origin of heap and stack memory division?

I read a lot of explanation of heap and stack memory, and all of them obscure anyway in terms of origin. First of all I understand how this memories works with software, but I don't understand the main source of this division. I assume that they are the same unspecialized physical memory, but...
For example say we have PC without any OS, and we want create some bootable program with assembly language for x86. I assume we can do this (Personally I don't know assembly, but some people write OS anyway). So the main question is Can we already operate with heap and stack, or we must create some memory managment machinery for this? If yes, so how it can be possible in terms of bare metal programming?
Adding something to the other answer, fairly correct but perhaps not very complete.
Heap and stack are two (software) ways to "manage" memory. The physical memory, normally, is a flat array of cells where a program can read and write. It is up to the running program to use those cells as it wants. But there is more to say.
1^ thing. Heap is totally software, while stack is also (or mainly) a hardware thing. Most processors have hardware (or CPU instruction) to support the stack, while most (or all?) don't care about the heap. Even more: there are small embedded processors (or microcontrollers) which have a separated stack area - totally different from other ram areas where the program could create a "heap".
2^ thing. Whean speaking about "programs", one can/should think that the operating system (the OS) is a program, specialized in managing resources (memory included), and extendable with "applications" (which are programs). In such scenario, stack and heap are managed in cooperation from both OS and the applications.
So, to reply to your main question, the 90% correct answer is: in bare metal you have already a stack - perhaps you have to issue some short instruction to set it up, but it is straightforward. But you don't have a heap, you must implement it in your program. First you set aside some memory to be used as a stack; and then you can set aside some more memory to be used as a heap, not forgetting that you must preserve some memory for normal/static data. The part of the program that manages the heap should know what to do, using but not erratically overwriting the stack and the static data, to perform its functions.

does buffer overflow still exist?

I was watching a university lecture about buffer overflow, and the professor ended up saying
even if we were able to fill the buffer with exploit code and jumped
into that code, we still can not execute it..
the reasons - he mentioned - are:
programmers avoid the use of functions that cause overflow.
randomized stack offsets: at start of program, allocate random amount of space on stack to make it difficult to predict the beginning of inserted code.
use techniques to detect stack corruption.
non-executable code segments: only allow code to execute from "text" sections of memory.
now I wonder, does buffer overflow attack still exist nowadays? or it is out-of-date.
detailed answer will be very appreciated!
Not all of us. There's a bunch of new programmers every day. Does our collective knowledge that strcpy is bad get disseminated to them magically? I don't think so.
Difficult, yes. Impossible, no. Any vulnerability that can be turned into an arbitrary read can defeat such protections trivially.
Indeed we can detect stack corruption, under certain circumstances. Canaries, for instance, may be overwritten, their value is compiler dependent, and they might not protect against all kinds of stack corruption (e.g. GCC's -fstack-protector-strong protects against EIP overwrite, but not other kinds of overrun)
W^X memory is a reality, but how many OS's have adopted it for the stack? That'd be an interesting little research project for your weekend. :) Additionally, if you look into return-oriented-programming (ROP) techniques (return-to-libc is an application of it), you'll see it also can be bypassed.

Force memory allocation always to the same virtual address [duplicate]

This question already has answers here:
disable the randomness in malloc
(6 answers)
Closed 9 years ago.
I'm experimenting with Pin, an instrumentation tool, which I use to compute some statistics based on memory address of my variables. I want to re-run my program with the information gathered by my instrumentation tool, but for that it's crucial that virtual memory addresses remain the same through different runs.
In general, I should let the OS handle memory allocation, but in this case I need some kind of way to force it to always allocate to the same virtual address. In particular, I'm interested in a very long array, which I'm currently allocating with numa_alloc_onnode(), though I could use something else.
What would be the correct way to proceed?
Thanks
You could try mmap(2).
The instrumented version of your program will use a different memory layout than the original program because pin needs memory for the dynamic translation etc. and will change the memory layout. (if I recall correctly)
With the exception of address space layout randomization, most memory allocators, loaders, and system routines for assigning virtual memory addresses will return the same results given the same calls and data (not by deliberate design for that but by natural consequence of how software works). So, you need to:
Disable address space layout randomization.
Ensure your program executes in the same way each time.
Address space layout randomization is deliberate changes to address space to foil attackers: If the addresses are changed in each program execution, it is more difficult for attacks to use various exploits to control the code that is executed. It should be disabled only temporarily and only for debugging purposes. This answer shows one method of doing that and links to more information, but the exact method may depend on the version of Linux you are using.
Your program may execute differently for a variety of reasons, such as using threads or using asynchronous signals or interprocess communication. It will be up to you to control that in your program.
Generally, memory allocation is not guaranteed to be reproducible. The results you get may be on an as-is basis.

Allocating a new call stack

(I think there's a high chance of this question either being a duplicate or otherwise answered here already, but searching for the answer is hard thanks to interference from "stack allocation" and related terms.)
I have a toy compiler I've been working on for a scripting language. In order to be able to pause the execution of a script while it's in progress and return to the host program, it has its own stack: a simple block of memory with a "stack pointer" variable that gets incremented using the normal C code operations for that sort of thing and so on and so forth. Not interesting so far.
At the moment I compile to C. But I'm interested in investigating compiling to machine code as well - while keeping the secondary stack and the ability to return to the host program at predefined control points.
So... I figure it's not likely to be a problem to use the conventional stack registers within my own code, I assume what happens to registers there is my own business as long as everything is restored when it's done (do correct me if I'm wrong on this point). But... if I want the script code to call out to some other library code, is it safe to leave the program using this "virtual stack", or is it essential that it be given back the original stack for this purpose?
Answers like this one and this one indicate that the stack isn't a conventional block of memory, but that it relies on special, system specific behaviour to do with page faults and whatnot.
So:
is it safe to move the stack pointers into some other area of memory? Stack memory isn't "special"? I figure threading libraries must do something like this, as they create more stacks...
assuming any area of memory is safe to manipulate using the stack registers and instructions, I can think of no reason why it would be a problem to call any functions with a known call depth (i.e. no recursion, no function pointers) as long as that amount is available on the virtual stack. Right?
stack overflow is obviously a problem in normal code anyway, but would there be any extra-disastrous consequences to an overflow in such a system?
This is obviously not actually necessary, since simply returning the pointers to the real stack would be perfectly serviceable, or for that matter not abusing them in the first place and just putting up with fewer registers, and I probably shouldn't try to do it at all (not least due to being obviously out of my depth). But I'm still curious either way. Want to know how these sorts of things work.
EDIT: Sorry of course, should have said. I'm working on x86 (32-bit for my own machine), Windows and Ubuntu. Nothing exotic.
All of these answer are based on "common processor architectures", and since it involves generating assembler code, it has to be "target specific" - if you decide to do this on processor X, which has some weird handling of stack, below is obviously not worth the screensurface it's written on [substitute for paper]. For x86 in general, the below holds unless otherwise stated.
is it safe to move the stack pointers into some other area of memory?
Stack memory isn't "special"? I figure threading libraries
must do something like this, as they create more stacks...
The memory as such is not special. This does however assume that it's not on an x86 architecture where the stack segment is used to limit the stack usage. Whilst that is possible, it's rather rare to see in an implementation. I know that some years ago Nokia had a special operating system using segments in 32-bit mode. As far as I can think of right now, that's the only one I've got any contact with that uses the stack segment for as x86-segmentation mode describes.
Assuming any area of memory is safe to manipulate using the stack
registers and instructions, I can think of no reason why it would be a
problem to call any functions with a known call depth (i.e. no
recursion, no function pointers) as long as that amount is available
on the virtual stack. Right?
Correct. Just as long as you don't expect to be able to get back to some other function without switching back to the original stack. Limited level of recursion would also be acceptable, as long as the stack is deep enough [there are certain types of problems that are definitely hard to solve without recursion - binary tree search for example].
stack overflow is obviously a problem in normal code anyway,
but would there be any extra-disastrous consequences to an overflow in
such a system?
Indeed, it would be a tough bug to crack if you are a little unlucky.
I would suggest that you use a call to VirtualProtect() (Windows) or mprotect() (Linux etc) to mark the "end of the stack" as unreadable and unwriteable so that if your code accidentally walks off the stack, it crashes properly rather than some other more subtle undefined behaviour [because it's not guaranteed that the memory just below (lower address) is unavailable, so you could overwrite some other useful things if it does go off the stack, and that would cause some very hard to debug bugs].
Adding a bit of code that occassionally checks the stack depth (you know where your stack starts and ends, so it shouldn't be hard to check if a particular stack value is "outside the range" [if you give yourself some "extra buffer space" between the top of the stack and the "we're dead" zone that you protected - a "crumble zone" as they would call it if it was a car in a crash]. You can also fill the entire stack with a recognisable pattern, and check how much of that is "untouched".
Typically, on x86, you can use the existing stack without any problems so long as:
you don't overflow it
you don't increment the stack pointer register (with pop or add esp, positive_value / sub esp, negative_value) beyond what your code starts with (if you do, interrupts or asynchronous callbacks (signals) or any other activity using the stack will trash its contents)
you don't cause any CPU exception (if you do, the exception handling code might not be able to unwind the stack to the nearest point where the exception can be handled)
The same applies to using a different block of memory for a temporary stack and pointing esp to its end.
The problem with exception handling and stack unwinding has to do with the fact that your compiled C and C++ code contains some exception-handling-related data structures like the ranges of eip with the links to their respective exception handlers (this tells where the closest exception handler is for every piece of code) and there's also some information related to identification of the calling function (i.e. where the return address is on the stack, etc), so you can bubble up exceptions. If you just plug in raw machine code into this "framework", you won't properly extend these exception-handling data structures to cover it, and if things go wrong, they'll likely go very wrong (the entire process may crash or become damaged, despite you having exception handlers around the generated code).
So, yeah, if you're careful, you can play with stacks.
You can use any region you like for the processor's stack (modulo the memory protections).
Essentially, you simply load the ESP register ("MOV ESP, ...") with a pointer to the new area, however you managed to allocate it.
You have to have enough for your program, and whatever it might call (e.g., a Windows OS API), and whatever funny behaviours the OS has. You might be able to figure out how much space your code needs; a good compiler can easily do that. Figuring how much is needed by Windows is harder; you can always allocate "way too much" which is what Windows programs tend to do.
If you decide to manage this space tightly, you'll probably have to switch stacks to call Windows functions. That won't be enough; you'll likely get burned by various Windows surprises. I describe one of them here Windows: avoid pushing full x86 context on stack. I have mediocre solutions, but not good solutions for this.

Resources