I've been reading about stack and memory address location in a few tutorials and wondering why their reference of low and high memory location are different?
This is confusing.
E.g.
Low Memory Address located at the top while High Memory Address at the bottom
Low Memory Address located at the bottom while High Memory Address at the top
When I try it with a simple C program, it seems like Low Memory Address is located on the top. From bfbf7498 > bfbf749c > bfbf74a0 > bfbf74a4 > bfbf74a8 > bfbf74ac
user#linux:~$ cat > stack.c
#include <stdio.h>
int main()
{
int A = 3;
int B = 5;
int C = 8;
int D = 10;
int E = 11;
int F;
F = B + D;
printf("+-----------+-----+-----+-----+\n");
printf("| Address | Var | Dec | Hex |\n");
printf("|-----------+-----+-----+-----|\n");
printf("| %x | F | %d | %X |\n",&F,F,F);
printf("| %x | E | %d | %X |\n",&E,E,E);
printf("| %x | D | %d | %X |\n",&D,D,D);
printf("| %x | C | 0%d | %X |\n",&C,C,C);
printf("| %x | B | 0%d | %X |\n",&B,B,B);
printf("| %x | A | 0%d | %X |\n",&A,A,A);
printf("+-----------+-----+-----+-----+\n");
}
user#linux:~$
user#linux:~$ gcc -g stack.c -o stack ; ./stack
+-----------+-----+-----+-----+
| Address | Var | Dec | Hex |
|-----------+-----+-----+-----|
| bfbf7498 | F | 15 | F |
| bfbf749c | E | 11 | B |
| bfbf74a0 | D | 10 | A |
| bfbf74a4 | C | 08 | 8 |
| bfbf74a8 | B | 05 | 5 |
| bfbf74ac | A | 03 | 3 |
+-----------+-----+-----+-----+
user#linux:~$
It isn't exactly clear what your question is. What Arjun was answering is why does stack memory grow down (decreading memory addresses) instead of up (increasing memory addresses) and the simple answer to this is it is arbitrary. It really doesn't matter, but an architecture has to choose one or the other - there are typically cpu instructions that manipulate the stack and they are expecting a particular implementation.
The other possible question you may be asking is related to visual references from multiple sources. In your example above, you have one diagram showing low addresses at the top and you have another showing low addresses at the bottom. They are both showing the stack growing down from larger addresses down to smaller addresses. Again this is arbitrary, the authors needed to choose one or the other and they are communicating to you their choice. If you want to compare them side by side you may want to flip one so they have similar orientations.
By the way, your example code is showing that the stack does indeed start from high addresses and grow down (the memory address of 'A' is allocated first and has a higher memory address than the others).
Why does the stack address grow towards decreasing memory addresses?
This thread has a pretty good answer to your question. It also has a pretty good visual.
https://unix.stackexchange.com/questions/4929/what-are-high-memory-and-low-memory-on-linux?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
This is also a pretty good explanation (but related specifically to unix/linux)
Essentially it is totally dependent on the platform though
Explanation of why some stack memories grow differently:
There are a number of different methods used, depending on the OS (linux realtime vs. normal) and the language runtime system underneath:
1) dynamic, by page fault
typically preallocate a few real pages to higher addresses and assign the initial sp to that. The stack grows downward, the heap grows upward. If a page fault happens somewhat below the stack bottom, the missing intermediate pages are allocated and mapped. Effectively increasing the stack from the top towards the bottom automatically. There is typically a maximum up to which such automatic allocation is performed, which can or can not be specified in the environment (ulimit), exe-header, or dynamically adjusted by the program via a system call (rlimit). Especially this adjustability varies heavily between different OSes. There is also typically a limit to "how far away" from the stack bottom a page fault is considered to be ok and an automatic grow to happen. Notice that not all systems' stack grows downward: under HPUX it (used?) to grow upward so I am not sure what a linux on the PA-Risc does (can someone comment on this).
2) fixed size
other OSes (and especially in embedded and mobile environments) either have fixed sizes by definition, or specified in the exe header, or specified when a program/thread is created. Especially in embedded real time controllers, this is often a configuration parameter, and individual control tasks get fix stacks (to avoid runaway threads taking the memory of higher prio control tasks). Of course also in this case, the memory might be allocated only virtually, untill really needed.
3) pagewise, spaghetti and similar
such mechanisms tend to be forgotten, but are still in use in some run time systems (I know of Lisp/Scheme and Smalltalk systems). These allocate and increase the stack dynamically as-required. However, not as a single contigious segment, but instead as a linked chain of multi-page chunks. It requires different function entry/exit code to be generated by the compiler(s), in order to handle segment boundaries. Therefore such schemes are typically implemented by a language support system and not the OS itself (used to be earlier times - sigh). The reason is that when you have many (say 1000s of) threads in an interactive environment, preallocating say 1Mb would simply fill your virtual address space and you could not support a system where the thread needs of an individual thread is unknown before (which is typically the case in a dynamic environment, where the use might enter eval-code into a separate workspace). So dynamic allocation as in scheme 1 above is not possible, because there are would be other threads with their own stacks in the way. The stack is made up of smaller segments (say 8-64k) which are allocated and deallocated from a pool and linked into a chain of stack segments. Such a scheme may also be requried for high performance support of things like continuations, coroutines etc.
Modern unixes/linuxes and (I guess, but not 100% certain) windows use scheme 1) for the main thread of your exe, and 2) for additional (p-)threads, which need a fix stack size given by the thread creator initially. Most embedded systems and controllers use fixed (but configurable) preallocation (even physically preallocated in many cases).
Sorry if this answer is a bit dense, I'm not sure to simply it and still give a valid explanation. As for why the example you gave in C has Low Memory Address located on the top, the simplest way of explaining it is that C was built like that.
Related
So I know that a call free() on a variable allocated in the stack would cause an invalid pointer error.
In a malloced pointer, malloc() allocates 8 bytes before the actual pointer to leave information about its size. So I was wondering if I had made a long before a struct and then called free on that struct if it would be possible to free the struct (of course this is going off the assumption that the allocation of those 8 bytes is the only thing extra that malloc does).
I guess my final question would be if there is any real difference between stack variable allocation and heap allocation (in terms of the backend calls to the kernel).
Some C implementations might use data before the allocated space to help them manage the space. Some do not. Some do that for allocations of certain sizes and not others. If they do it, it might be eight bytes, or it might be some other amount. You should not rely on any behavior in this regard.
When you declare a long object and a struct of some sort in a block, the compiler might or might not put them next to each other on the stack. It might put the long before the struct or vice-versa, or, because it optimizes your program, it might keep the long in a register and never put it on the stack at all, and it might do other things. In some C implementations, a long is eight bytes. In some, it is not. There is no good way for you to ensure two separate objects are put in adjacent memory. (You can make them not separate by putting them in a larger struct.)
Even if you are able to cobble together a long followed by a struct, how would you know what value to put into the long? Did the C implementation put the length of the allocation in there? Or is it a pointer to another block? Or to some other part of the database the C implementation uses to track allocated memory? If malloc and free are using memory just before the allocated space, that memory is not empty. It needs to have some value in it, and you do not know what that is.
If you get lucky, passing the address of the struct to free might not crash your program right away. But then you have freed a part of the stack, in some sense. When you call malloc again, the pointer you get back might be for that memory, and then your program presumably will write to that space. Then what happens when your program calls other routines, causing the stack to grow into that space? You will have overlapping uses of the same memory. Some of your data will be stomping over other data, and your program will not work.
Yes, there are differences between memory allocated on the stack and memory allocated from the heap. This is outside of the model that C presents to your program. However, in systems where processes have stack and heap, they are generally in different places in the memory of your process. In particular, the stack memory must remain available for use as the stack grows and shrinks. You cannot mix it with the heap without breaking things.
It is good to ask questions about what happens when you try various things. However, modern implementations of malloc and free are quite complicated, and you pretty much have to accept them as a service that you cannot peer into easily. Instead, to help you learn, you might think about this:
How would you write your own malloc and free?
Write some code that allocates a large amount of memory using malloc, say a megabyte, and write two routines called MyMalloc and MyFree that work like malloc and free, except they use the memory you allocated. When MyMalloc is called, it will carve out a chunk of the memory. When MyFree is called, it will return the chunk to make it available again.
Write some experimental code that somewhat randomly calls MyMalloc with various sizes and MyFree, in somewhat random orders.
How can you make all of this work? How do you divide the megabyte into chunks? How do you remember which chunks are allocated and which are free? When somebody calls MyFree, how do you know how much they are giving back? When neighboring chunks are returned with MyFree, how do you put them back together into bigger pieces again?
I think your real question is how does the stack work.
The stack is one big memory block allocated when your program starts. There is a pointer to the top of the stack. The name is suggestive: think of a stack of magazines.
When a function is called, the parameters are placed on top of the stack. The function itself then places its local variables on top of that. When the function exits, the stack pointer is simply moved back to where it was before the function was called. This frees all the local variables and input arguments used by the function.
The heap manager has nothing to do with this block of memory. Tricking free to put some of the stack in the heap manager’s memory is going to wreak havoc on your program. The memory would likely be used again as you call other functions, and used simultaneously if you malloc memory, leading to data corruption at best, and stack corruption (read crash) at worst.
When you speak of memory being allocated on the stack, you have to understand that in most implementations, the stack is allocted in a block -- variables are not allocated individually or separately.
+-+ +--------------------------------------------------+
| | Stack frame data section; local variables and |
| | |
| | function arguments in order determined by the |
| | |
| | calling convention of the target platform |
Stack frame for | | |
function call; +---+ | (size is implementation dependent) |
block allocated | | |
| | |
| +--------------------------------------------------+
| |Instruction pointer (return address) |
| +--------------------------------------------------+
| |Space for return value (if not in a CPU register) |
+-+ +--------------------------------------------------+
| |
| |
| |
| (stack frame of previously called function) |
| |
| |
+--------------------------------------------------+
Each function call is allocated its own stack frame, with the required size to hold the return value (if necessary), the instruction pointer of the return address, and all of the local variables and function arguments. So while memory for the stack frame is allocated, it's not allocated with respect to any individual variable -- only in regard to the sum of the individual sizes.
Suppose I have the following array:
int list[3]={2,8,9};
printf("%p,%p,%p",(void*)&list[0],(void*)&list[1],(void*)&list[2]);
Is it always guaranteed that &list[0]<&list[1]<&list[2] ?
I had assumed it to be a hard and fast rule while using C, but now have to very sure about it as an OP just asked me about it when I answered his question about endianness
Little endian or Big endian
What gave me second thoughts is the stacks can grow up or down issue.I am not very sure about that so your rigorous answers are appreciated.Thanks.
Yes, it's guaranteed that &list[0]<&list[1] and &list[1]<&list[2]. When pointers to elements of the same array are compared, the pointer to the element with the larger subscript will be considered to have larger value. This is specified in C99 6.5.8#5:
pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values
However, it is not guaranteed that the values printed by printf with %p will also follow the same ordering - these values are implementation-defined.
From the C standard ("Section 6.2.5 Types"):
...An array type describes a contiguously allocated nonempty set of objects...
Arrays will be allocated contiguously in "memory".
What Eric and Interjay are saying, which is something I didn't consider when I initially wrote this so thank you Eric and Interjay, is that this only applies to the virtual memory addresses.
Your machine and OS most likely use a memory management unit (MMU) which creates a virtual address space (where you are working) and maps this onto physical memory in chunk sized blocks (pages).
So what Eric and Interjay are saying is that although the virtual addresses will be contiguous, the chunks of physical memory they map to may be at different addresses.
Virtual Physical
+----------+ +----------+
| | |
| VMA pg 1 |---------->| PMA 88 (VMA1)
| | |
+----------+ +----------+
| |\ ...
| VMA pg 2 | \ ...
| | \ ...
+----------+ \ ...
\ \ ... big gap in physical
\ \ ... memory
\ \ ...
\ \ ...
\ >--+----------+
\ |
\ | PMA 999 (VMA2)
\ |
>-+----------+
So, for small arrays (smaller than the page size), this may be true for both VMA and PMA addresses, although most likely PMA != VMA. For arrays larger than the page size, although VMA looks contiguous, PMA may well be disjoint and out of order, as the above diagram tries to show...
Also, I think Interjay and Eric are going a step further and saying that any C address, although contiguous in the C model, might be anywhere in memory. Although this is unlikely as most OS's implement some kind of paging to get a virtual to physical mapping, it can technically be the case I think... this was good to learn to consider, so thanks chaps :)
If you are asking about how memory appears inside the C model, then arrays appear to be contiguous in C code, and the C expression &list[0] < &list[1] is true.
If you are asking about how actual memory appears inside a C implementation, the C standard does not require any particular arrangement of arrays in memory. Most C implementations use consecutive ascending virtual memory for arrays, but descending addresses would be a simple variation. And, at the level of physical memory, arrays are not generally consecutive, because the map from virtual memory to physical memory is determined by the operating system based on whatever it has available and may even change during execution of a process.
Additionally, there is no guarantee that the strings printed by %p are memory addresses.
I'm reading a text on C about the memory segments available for use. The text says the two highest segments are the heap and the stack, which grow towards each other.
Segments:
________
|Text (Machine code)
|________
|Data
|________
|BSS
|________
|Heap (grows towards stack)
|
|
|Stack (grows towards heap)
|________
Creating a simple program to print out the memory locations of variables created in the lower four segments yields the following:
initialized in | Hex Address | Decimal Value
Data - 0x080497ec 134,518,764
BSS - 0x080497f8 134,518,776
Heap - 0x0804a008 134,520,840
Stack - 0xbffff844 3,221,223,524
Is the interpretation that the heap and the stack have ~3 Billion bytes to share between them? The computer I'm working on only has 1 GB of memory, which makes me doubt the accuracy of this interpretation.
There's 3GB of address space there, it doesn't mean that it has to be mapped (most likely it isn't). It's just space, the operating system still has to map physical memory into that space when the program asks for it.
This kind of memory model you read about is pretty outdated. Modern operating systems have much more complex memory layouts, the heap doesn't have to grow linearly, stacks are sometimes located below everything else, and text and data don't necessarily have to be next to each other. Add in shared libraries, address space layout randomization and things get very funky.
I've been trying to learn the basics of a heap overflow attack. I'm mostly interested in using a corruption or modification of the chunk metadata for the basis of the attack, but I'm also open to other suggestions. I know that my goal of the exploit should be do overwrite the printf() function pointer with that of the challenge() function pointer, but I can't seem to figure out how to achieve that write.
I have the following piece of code which I want to exploit, which is using malloc from glibc 2.11.2 :
void challenge()
{
puts("you win\n");
}
int main(int argc, char **argv)
{
char *inputA, *inputB, *inputC;
inputA = malloc(32);
inputB = malloc(32);
inputC = malloc(32);
strcpy(inputA, argv[1]);
strcpy(inputB, argv[2]);
strcpy(inputC, argv[3]);
free(inputC);
free(inputB);
free(inputA);
printf("execute challenge to win\n");
}
Obviously, achieving an actual overwrite of an allocated chunk's metadata is trivial. However, I have not been able to find a way to exploit this code using any of the standard techniques.
I have read and attempted to implement the techniques from:
The paper: w00w00 on Heap Overflows
Although the paper is very clear, the unlink technique has been obsolete for some time.
Malloc Maleficarum.txt
This paper expands upon the exploit techniques from the w00w00 days, and accounts for the newer versions of glibc. However, I have not found that given the 5 techniques detailed in the paper, that the code above matches any of the prerequisites for those techniques.
Understanding the Heap By Breaking it(pdf)
The pdf gives a pretty good review of how the heap works, but focuses on double free techniques.
I originally tried to exploit this code by manipulating the size value of the chunk for inputC, so that it pointed back to the head of the inputC chunk. When that didn't work, I tried pointing further back to the chunk of inputB. That is when I realized that the new glibc performs a sanity check on the size value.
How can a user craft an exploit to take advantage of a free, assuming he has the ability to edit the allocated chunk's metadata to arbitrary values, and user it to overwrite a value in the GOT or write to any other arbitrary address?
Note: When I write 'arbitrary address' I understand that memory pages may be read only or protected, I mean an address that I can assume I can write to.
Note: I will say before I answer that this is purely an academic answer, not intended to be used for malicious purposes. I am aware of the exercises OP is doing and they are open source and not intended to encourage any users to use these techniques in unapproved circumstances.
I will detail the technique below but for your reference I would take a look at the Vudo malloc tricks (It's referenced in one of your links above) because my overview is going to be a short one: http://www.phrack.com/issues.html?issue=57&id=8
It details how malloc handles creating blocks of memory, pulling memory from lists and other things. In particular the unlink attack is of interest for this attack (note: you're correct that glibc now performs a sanity check on sizes for this particular reason, but you should be on an older libc for this exercise... legacy bro).
From the paper, an allocated block and a free block use the same data structure, but the data is handled differently. See here:
chunk -> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| prev_size: size of the previous chunk, in bytes (used |
| by dlmalloc only if this previous chunk is free) |
+---------------------------------------------------------+
| size: size of the chunk (the number of bytes between |
| "chunk" and "nextchunk") and 2 bits status information |
mem -> +---------------------------------------------------------+
| fd: not used by dlmalloc because "chunk" is allocated |
| (user data therefore starts here) |
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| bk: not used by dlmalloc because "chunk" is allocated |
| (there may be user data here) |
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| |
| |
| user data (may be 0 bytes long) |
| |
| |
next -> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| prev_size: not used by dlmalloc because "chunk" is |
| allocated (may hold user data, to decrease wastage) |
+---------------------------------------------------------+
Allocated blocks don't use the fd or bk pointers, but free ones will. This is going to be important later. You should know enough programming to understand that "blocks" in Doug Lea's malloc are organized into a doubly-linked list; there's one list for free blocks and another for allocated ones (technically there are several lists for free depending on sizes but it's irrelevant here since the code allocates blocks of the same size). So when you're freeing a particular block, you have to fix the pointers to keep the list in tact.
e.g. say you're freeing block y from the list below:
x <-> y <-> z
Notice that in the diagram above the spots for bk and fd contain the necessary pointers to iterate along the list. When malloc wants to take a block p off of the list it calls, among other things, a macro to fix the list:
#define unlink( y, BK, FD ) {
BK = P->bk;
FD = P->fd;
FD->bk = BK;
BK->fd = FD;
}
The macro itself isn't hard to understand, but the important thing to note in older versions of libc is that it doesn't perform sanity checks on the size or the pointers being written to. What it means in your case is that without any sort of address randomization you can predictably and reliably determine the status of the heap and redirect an arbitrary pointer to an address of your choosing by overflowing the heap (via the strncopy here) in a specific way.
There's a few things required to get the attack to work:
the fd pointer for your block is pointing to the address you want to overwrite minus 12 bytes. The offset has to do with malloc cleaning up the alignment when it modifies the list
The bk pointer of your block is pointing to your shellcode
The size needs to be -4. This accomplishes a few things, namely it sets the status bits in the block
So you'll have to play around with the offsets in your specific example, but the general malicious format that you're trying to pass with the strcpy here is of the format:
| junk to fill up the legitimate buffer | -4 | -4 | addr you want to overwrite -12 (0x0C) | addr you want to call instead
Note the negative number sets the prev_size field to -4, which makes the free routing believe that the prev_size chunk actually starts in the current chunk that you control/are corrupting.
And yes, a proper explanation wouldn't be complete without mentioning that this attack doesn't work on current versions of glibc; the size has a sanity check done and the unlink method just won't work. That in combination with mitigations like address randomization make this attack not viable on anything but legacy systems. But the method described here is how I did that challenge ;)
Note that most of the techniques explained in Malloc Malleficarum are now protected. The glibc has improved a lot all that double free scenarios.
If you want to improve your knowledge about the Malloc Malleficarum techniques read the Malloc Des-Malleficarum and the House of Lore: Reloaded written by blackngel. You can find these texts in phrack.
Malloc Des-Malleficarum
I'm also working on it, and I can say to you that, for example, House of Mind is no longer exploitable, at least, as is explained in the texts. Although it might be possible to bypass the new restrictions added to the code.
Add that the easiest way to execute your code is to overwrite the .dtors address therefore your code will always be executed once the program finish.
If you download the glibc code and study the critic zones of malloc., etc you will find code checks that are not documented in the documents previously mentioned. These check were included to stop the double free party.
On the other hand, the presentation of Justin N. Ferguson (Understanding the Heap by breaking it) that you could find in youtube (BlackHat 2007) is perfect in order to understand all the heap mechanics, but I must admit that the techniques shown are far from being reliable, but at least, he opens a new field to heap exploitation.
Understanding the heap by breaking it
Anyways, I'm also working on it, so if you want to contact me, we can share our advances. You can reach me in the overflowedminds.net domain as newlog (build the mail address yourself ^^ ).
Heap overflows are tricky to pull off, and are very heavilly heap-layout dependent, although it looks like you're going after the Windows CRT heap, which has lots of mitigations in place specifically to stop this type of attack.
If you really do want to do this kind of thing, you need to happy jumping into WinDbg and stepping into functions like free to see exactly what is going on inside free, and hence what kind of control you might be able to achieve via the heap overflow of the previous value.
I won't give you any more specific help than that for the simple reason that demonstrating a heap overflow is usually enough for defensive security - defensive security experts can report a heap overflow without needing to actually fully exploit it. The only people who do need to fully exploit a heap-overflow all the way to remote code execution are people exploiting bugs offensively, and if you want to do that, you're on your own.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
C programming : How does free know how much to free?
Hello All,
How OS will come to know how much size i have to free when we define free(pointer).
I mean we are not providing any size , only pointer to free statement.
How's internally handle the size ?
Thanks,
Neel
The OS won't have a clue, as free is not a system call. However, your C libraries memory allocation system will have recorded the size in some way when the memory was originally allocated by malloc(), so it knows how much to free.
The size is stored internally in the allocator, and the pointer you pass to free is used to reach that data. A very basic approach is to store the size 4 bytes before the pointer, so substracting 4 from the pointer gives you a pointer to it's size.
Notice that the OS doesn't handle this directly, it's implemented by your C/C++ runtime allocator.
When you call malloc, the C library will automatically carve a space out for you on the heap. Because things created on the heap are created dynamically, what is on the heap at any given point in time is not known as it is for the stack. So the library will keep track of all the memory that you have allocated on the heap.
At some point your heap might look like this:
p---+
V
---------------------------------------
... | used (4) | used (10) | used (8) | ...
---------------------------------------
The library will keep track of how much memory is allocated for each block. In this case, the pointer p points to the start of the middle block.
If we do the following call:
free(p);
then the library will free this space for you on the heap, like so...
p---+
V
----------------------------------------
... | used (4) | unused(10) | used (8) | ...
----------------------------------------
Now, the next time that you are looking for some space, say with a call like:
void* ptr = malloc(10);
The newly unused space may be allocated to your program again, which will allow us to reduce the overall amount of memory our program uses.
ptr---+
V
----------------------------------------
... | used (4) | used(10) | used (8) | ...
----------------------------------------
The way your library might handle internally managing the sizes is different. A simple way to implement this, would be to just add an additional amount of bytes (we'll say 1 for the example) at the beginning of each block allocated to hold the size of each block. So our previous block of heap memory would look like this:
bytes: 1 4 1 10 1 8
--------------------------------
... |4| used |10| used |8| used | ...
--------------------------------
^
+---ptr
Now, if we say that block sizes will be rounded up to be divisible by 2, they we have an extra bit at the end of the size (because we can always assume it to be 0, which we can conveniently use to check whether the corresponding block is used or unused.
When we pass a pointer in free:
free(ptr);
The library would move the pointer given back one byte, and change the used/unused bit to unused. In this specific case, we don't even have to actually know the size of the block in order to free it. It only becomes an issue when we try to reallocate the same amount of data. Then, the malloc call would go down the line, checking to see if the next block was free. If it is free, then if it is the right size that block will be returned back to the user, otherwise a new block will be cut at the end of the heap, and more space allocated from the OS if necessary.