Stack memory layout in c - c

I have written a small program for learning about the stack pointer in C. Usually, the stack pointer points to the last address of RAM. For example, I run this code in Linux, which has the stack pointer in last address (0x7fffffffffff) but the variables are stored in these addresses (for this program):
c --> 0x7ffc67ca5c94
b --> 0x7ffc67ca5c98
a --> 0x7ffc67ca5c9c
Code:
void fun(int *ptr)
{
printf("c --> %p\n",ptr);
ptr++;
printf("b --> %p\n",ptr);
ptr++;
printf("a --> %p\n",ptr);
}
int main()
{
int a = 10,b = 20,c = 30;
printf("%p %p %p\n",&a,&b,&c);
fun(&c);
return 0;
}
Output for this program:
0x7ffc67ca5c9c 0x7ffc67ca5c98 0x7ffc67ca5c94
c --> 0x7ffc67ca5c94
b --> 0x7ffc67ca5c98
a --> 0x7ffc67ca5c9c
My questions are:
Why are the variables not stored in last section of the stack frame (0x7fffffffffff) which skips some memory locations and gets stored? Any valid reason for such behavior?
Why the stack pointer address has six bytes?
I am working on a 64-bit machine with a 32-bit GCC compiler.

which has the stack pointer in last address (0x7fffffffffff)
If running a 32bit process, the last address (minus the reserved area) would be 0x7fff ffff when doing the usual 2GB/2GB split. But 32bit Linux typically reserves only a 1GB kernel address area, so the stack would actually start at 0xc000 0000 (TASK_SIZE).
What you are seeing here, instead is the curious split of the x64 address space layout. Here, indeed the user address space ends at 0x0000 7fff ffff ffff, with 0xffff ff80 0000 0000 and above reserved for the kernel.
Current MMUs actually enforce this, 0xfffe ff80 0000 0000 or similar are not valid addresses, the bit 47-63 must be equal to form a Canonical Form Address
Why the stack pointer address has six bytes?
It does not look like it does from the output of your program.
You are printing the size each variable takes on the stack, not the pointer size. The six-byte addresses from printf() are actually 64bit-addresses with leading zeros cut off (thanks to #Jonathan Leffler for spotting this).
Indeed, sizeof(int *) == 8, but sizeof(int) == 4, because even 64bit Linux has 32bit ints (only long is 64 bit). It's quite easy to miss that, though.
Why are the variables not stored in last section of the stack frame (0x7fffffffffff) which skips some memory locations and gets stored? Any valid reason for such behavior?
If you look here, there is quite some stuff going into that address space before the user stack starts. Since you might need whole pages for this code for protection, there might be some overhead. Add library startup code and you might get quite a few bytes of memory.
Most of the code there is probably inherited copy-on-write (or even read-only) from the parent process.
Edit: On x64, any kernel-exported code probably go in the higher memory area. Don't claim what I have not verified ;-).
As a side note: When I compile and run your code on 64bit FreeBSD 10.2 I get this
0x7fffffffe5c8 0x7fffffffe5c4 0x7fffffffe5c0
c --> 0x7fffffffe5c0
b --> 0x7fffffffe5c4
a --> 0x7fffffffe5c8
which is similar to your output, even though FreeBSD seems to position the stack differently.
Running in 32bit mode, I get this:
0xffffd788 0xffffd784 0xffffd780
c --> 0xffffd780
b --> 0xffffd784
a --> 0xffffd788
The latter is probably really fun to explain (e.g. where is my kernel address space?).

variables can be stored at any address
that address contains more than 32 bits, so it is assigning 64 bit
addresses

Related

How much stack space does C exactly use for a function activation record when calling it?

Environment:gcc version 6.3.0 (MinGW.org GCC-6.3.0-1) on Windows10
I compile and run code in command line.
Here is my code:
#include <stdio.h>
int func(void){
int c;
printf("stack top in func \t%p\n", &c);
return 1;
}
void main(void) {
int arr[0];
int i;
printf("stack top before func \t%p\n", &i);
i = func();
int j;
printf("stack top after func \t%p\n", &j);
return;
}
Here is result:
stack top before func 0061FF2C
stack top in func 0061FEFC
stack top after func 0061FF28
The gap size between the stack top while in function and stack top out of function is 48 bytes.
I then changed the size of "arr" to 1 and the result is:
stack top before func 0061FF28
stack top in func 0061FEFC
stack top after func 0061FF24
The gap just shrinked and the stack top while in function stayed put.The gap size is now 44 bytes.
It stops shrinking when the size of "arr" is 3.
The new gap size is 52 bytes.
Is that sort of strategy of memory management?
What's the benefit when it can use 44 bytes while it chose to use 52 bytes and the size of variables before function call can be known while compile time?
I think you are making some unfounded assumptions on how the stack, and the compiler, work. Namely:
that variables are allocated at the moment you declare them,
that the "last" variable takes up the "top" of the stack,
that the variables only take as much space as they need,
that this has a clear and deterministic answer.
Here's a rough idea of what happens when you call a function in C, gcc, x86 platform, no optimizations:
The parameters (if any) are stored in registers and/or the stack. The details are different between 32 and 64 bit, integers/pointers, floats, and structs of different sizes, number of arguments, vararg, and more.
The call instruction is taken, which pushes the return address onto the stack (taking up 8 bytes in both 32 and 64 bit, I think, though for different reasons) and redirects the processor to the new address.
The stack pointer is saved in the BP register, after pushing the original value of BP (4 or 8 bytes).
The stack pointer is decremented by enough bytes to accommodate all local variables.
Upon returning,
The value of the BP register overwrites the stack pointer, negating step 4 automatically. Then the original value of BP is popped.
The ret instruction is taken, popping the return address and jumping there.
It should be noted that this is by no means universal, or guaranteed. "Simple" functions may be optimized to skip steps 3, 4 and 5. Step 4 can in principle happen multiple times. Additional magic can be done to the stack pointer like aligning it to a particular power-of-two boundary (like multiples of 128 for SSE instruction operands), allocating something called the red zone, alloca function, etc. Many exceptions and special cases exist. More details will depend on gcc command line parameters, or their built-in defaults per distribution. Other compilers may follow slightly different, yet compatible, conventions. But let's stick to this model.
What's important to notice is that all the local variables are often allocated all together in step 4, and the size that's taken may be either the total size required or more. For example, it may be mandated by the conventions that the compiler makes sure that the stack pointer is a multiple of 16 at any point (so that the functions themselves can rely on this), in which case it rounds up to the nearest multiple (also with regard to what had been taken in steps 1 through 3). Within this zone the locals are assigned addresses (offset from the BP or SP) such as to respect their size and alignment requirements.
Your example, especially the code in main, can not work because the compiler won't follow your wish to allocate the space for j only after returning from f. It happens along with arr and i in the beginning of the function and the order of the variables is unspecified, likely chosen so that they can be best "packed" into the space that's available, with ints taking addresses at 32- or 64-bit boundaries. Even if it did, the calculation would be mistaken by taking the address of j as the "stack top after func": at best, it would be "stack top after func and allocation". In general, the "stack top after func" must be the same as the "stack top before func" in the C calling convention.
In order to get a more concrete idea in your function, I would suggest either:
Studying the assembly after compilation. The tool at godbolt.com is great for this: here's your code compiled by gcc 8.2 in x86-64 as shown there.
The stack pointer should be reduced by 16 (line 6) plus 8 (the size of RBP # line 4) plus whatever the call at line 28 required to store the return address, 8 in 64-bit mode.
Using a debugger:
(gdb) b 11
(gdb) b 4
(gdb) run
Starting program: [redacted]
stack top before func 0x7fffffffd2dc
Breakpoint 1, main () at a.c:11
11 i = func();
(gdb) print $rsp
$1 = (void *) 0x7fffffffd2d0
(gdb) c
Continuing.
Breakpoint 2, func () at a.c:4
4 printf("stack top in func \t%p\n", &c);
(gdb) print $rsp
$2 = (void *) 0x7fffffffd2b0
You can see here that rsp reduced by 0x20 == 32.
It is because gcc's stack alignment.
In gcc stack alignment is 16 bytes by default,while,at least in my emvironment. I changed it to 4 bytes with compile option "-mpreferred-stack-boundary=2",just as same as size of int.
Then the stack top in function will move every single time I declare a new int.
Thanks for Jabberwocky and Korni's comments which introduced a new area I didn't know before.

Why does the buffer overflow in this code behave different from what I expect?

I have this Program:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void main(void) {
char *buffer1 = malloc(sizeof(char));
char *buffer2 = malloc(sizeof(char));
strcpy(buffer2, "AA");
printf("before: buffer1 %s\n", buffer1);
printf("before: buffer2 %s\n", buffer2);
printf("address, buffer1 %p\n", &buffer1);
printf("address, buffer2 %p\n", &buffer2);
strcpy(buffer1, "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB");
printf("after: buffer1 %s\n", buffer1);
printf("after: buffer2 %s\n", buffer2);
}
Which prints:
before: buffer1
before: buffer2 AA
address, buffer1 0x7ffc700460d8
address, buffer2 0x7ffc700460d0
after: buffer1 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
after: buffer2 B
What I expect this code to do:
As a char is 8 bits long, i expect that both buffers have the size of 1 byte/8 bits.
One ASCII char is 7 bits long, i expect that two characters fit into each buffer.
As I allocate two buffers of one byte directly after each other, i expect that they are directly next to each other in the memory. Therefore, i expect that the difference between each address is 1 (since the memory is addressed by byte?), and not 8 as my little program has printed.
As they are directly next to each other in the memory, i expect buffer 2 to be overflown with BB when I do strcpy(buffer1, BBBB); as the first BB are written to buffer1 and the rest overflows to buffer2. Therefore, i'd expect that strcpy(buffer1, "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"); produces:
an buffer overflow in buffer2, so that it has the value BBBBBBBBBBBBBBBBBBBBBBBBBBBBB or so.
How I calculated that: the amonut of B which have been strcpy'd - 4 B's for both buffers.
an segmentation fault. I have only allocated 2 bytes (since the size of buffer1 and buffer2 are together 2 bytes). Since BBBBBBBBBBBBBBBBBBBBBBBBB doesn't fit into neither buffer1 nor buffer2 (because both are already filled), that would be overflown to the next memory buffer after buffer2. And because i have not allocated that, i'd expect an segmentation fault.
Therefore, I want to ask: Why does my program act different than my expectations? Where did I misunderstand things?
I have a x86_64 architecture and the above program is compiled with gcc version 6.3.1 20170306 (GCC)
What I do not ask for:
I know that strcpy is not bound checking and the usage is intentional. I want to investiage buffer overflows and such. Therefore, please don't write an answer/comment saying that i should use a different method as strcpy.
First, please read What should main() return in C and C++?
Now focus on how you allocating memory.
How much memory does malloc(1) allocate?
8 bytes of overhead are added to our need for a single byte, and the
total is smaller than the minimum of 32, so that's our answer:
malloc(1) allocates 32 bytes.
which makes your basis soft.
Note: malloc(1) allocates 32 bytes That may be true for the implemenation discussed on that link, but it is extremely implementation-dependent and will be differ.
On the other hand, if you had done:
char buffer1[1], buffer2[1];
instead of dynamically allocating memory, you would see different results. For example, in my system:
Georgioss-MacBook-Pro:~ gsamaras$ ./a.out // with malloc
before: buffer1
before: buffer2 AA
address, buffer1 0x7fff5ecb6bd8
address, buffer2 0x7fff5ecb6bd0
after: buffer1 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
after: buffer2 BBBBBBBBBBBBBBBBB
Georgioss-MacBook-Pro:~ gsamaras$ gcc -Wall main.c // no malloc
Georgioss-MacBook-Pro:~ gsamaras$ ./a.out
Abort trap: 6
Tip: The size has not officially been rounded up; accessing bytes beyond the requested size has Undefined Behavior. (If it were officially rounded up, this would have implementation-defined behavior.)
As a char is 8 bits long, ...
This is correct for the stated architecture and operating system. (The C standard allows char to be more than 8 bits long, but this is very rare nowadays; the only example I know of is the TMS320 family of DSPs, where char may be 16 bits. It's not allowed to be smaller.)
Note that sizeof(char) == 1 by definition and therefore it is generally considered bad style to write sizeof(char) or foo * sizeof(char) in your code.
... i expect that both buffers have the size of 1 byte/8 bits.
This is also correct (but see below).
One ASCII char is 7 bits long, i expect that two characters fit into each buffer.
This is not correct, for two reasons. First, nobody uses 7-bit ASCII anymore. Each character is in fact eight bits long. Second, two seven-bit characters do not fit into one eight-bit buffer. I see that there is some confusion on this point in the comments on the question, so let me attempt to explain further: Seven bits can represent 27 different values, just enough room for the 128 different characters defined by the original ASCII standard. Two seven-bit characters, together, can have 128 * 128 = 16384 = 214 different values; that requires 14 bits to represent, and will not fit into eight bits. You seem to have thought it was only 2 * 128 = 28, which would fit into eight bits, but that's not right; it would mean that once you saw the first character, there were only two possibilities for the second character, not 128.
As I allocate two buffers of one byte directly after each other, i expect that they are directly next to each other in the memory. Therefore, i expect that the difference between each address is 1 (since the memory is addressed by byte?), and not 8 as my little program has printed.
As you have observed for yourself, your expectations are incorrect.
malloc is not required to put consecutive allocations next to each other; in fact, "are these allocations next to each other" may not be a meaningful question. The C standard goes out of its way to avoid requiring there to be any meaningful comparison between two pointers that don't point into the same array.
Now, you are working on a system with a "flat address space", so it is meaningful to compare pointers from successive allocations (provided you do it in your own brain, not with code) and there is a logical explanation for the gap between the allocations, but first I have to point out that you printed the wrong addresses:
printf("address, buffer1 %p\n", &buffer1);
printf("address, buffer2 %p\n", &buffer2);
This prints the addresses of the pointer variables, not the addresses of the buffers. You should have written
printf("address, buffer1 %p\n", (void *)buffer1);
printf("address, buffer2 %p\n", (void *)buffer2);
(The cast to void * is required because printf takes a variable argument list.) If you had written that you would have seen output similar to
address, buffer1 0x55583d9bb010
address, buffer2 0x55583d9bb030
and the important thing to notice is that these allocations differ by sixteen bytes, and not only that, they're both evenly divisible by 16.
malloc is required to produce buffers that are aligned as required for any type, even if you can't fit a value of that type into the allocation. An address is aligned to some number of bytes if it's evenly divisible by that number. On your system, the maximum alignment requirement is 16; you can confirm this by running this program...
#include <stdalign.h>
#include <stddef.h>
#include <stdio.h>
int main(void) { printf("%zu\n", alignof(max_align_t)); return 0; }
So that means all addresses returned by malloc must be evenly divisible by 16. Therefore, when you ask malloc for two one-byte buffers, it has to leave a fifteen-byte gap between them. This does not mean that malloc rounded the size up; the C standard specifically forbids you to access the bytes in the gap. (I'm not aware of any modern, commercial CPUs that can enforce that prohibition, but debugging tools like valgrind will, and there have been experimental CPU designs that can do it. Also, often the space immediately before or after a malloc block contains data used internally by the malloc implementation, which you must not tamper with.)
There's a similar gap after the second allocation.
As they are directly next to each other in the memory, i expect buffer 2 to be overflown with BB when I do strcpy(buffer1, BBBB); as the first BB are written to buffer1 and the rest overflows to buffer2.
As previously discussed, they are not directly next to each other in memory, and each B takes up eight bits. One B is written to your first allocation, the next 15 to the gap between the two allocations, the 16th to the second allocation, 15 more after that to the gap after the second allocation, and the final one B and one NUL to the space beyond.
I have only allocated 2 bytes (since the size of buffer1 and buffer2 are together 2 bytes). Since BBBBBBBBBBBBBBBBBBBBBBBBB doesn't fit into neither buffer1 nor buffer2 (because both are already filled), that would be overflown to the next memory buffer after buffer2. And because i have not allocated that, i'd expect an segmentation fault.
We've already discussed why your calculations were incorrect, but you did write all the way past the end of the gap after the second allocation and into the "space beyond", so why no segfault? This is because, at the level of operating system primitives, memory is allocated to applications in units called "pages", which are larger than the amount of memory you asked for. The CPU can only detect a buffer overrun and trigger a segmentation fault if the overrun crosses a page boundary. You just didn't go far enough. I experimented with your program on my computer, which is very similar, and I need to write 132 kilobytes (a kilobyte is 1024 bytes) (some people say that that's supposed to be called a kibibyte; they are wrong) beyond the end of buffer1 to get a segfault. Pages on my computer are only 4 kilobytes each, but malloc asks the OS for memory in even larger chunks because system calls are expensive.
Not getting a prompt segfault does not mean you are safe; there is an excellent chance you clobbered malloc's internal data, or another allocation somewhere in the "space beyond". If I take your original program and add a call to free(buffer1) at the end, it crashes in there.
malloc does not guarantee location in memory. You cannot be sure even with back to back calls that the memory space will be contiguous. In addition, malloc often allocates more space than necessary. A segfault could well occur with your code, but is not guaranteed.
printf with the %s specifier prints characters from the pointer until a NUL (ASCII 0) character is encountered.
Remember, buffer overflow is undefined behavior, which means just that: you do not know exactly what will happen.

How to eliminate stack overflow error even after allocating a large stack size?

Unhandled exception at 0x00AA9379 in A.exe: 0xC00000FD: Stack overflow (parameters: 0x00000000, 0x00802000).
I am facing a stack overflow error. I'm using VS15 as my IDE. I tried to allocate more memory for the stack. For that I used Project >> Properties >> Linker >> System >> Stack allocation and allocated 4GB for the stack. But the error continues to stop at chkstk.asm at this line
99 sub eax, _PAGESIZE_ ; decrease by PAGESIZE.
But the problem isn't solved. How do I know in advance that how much stack size would I need? I've used dynamic memory allocation for all of the large variables. But cannot solve the problem. Here is a verifiable example...
Here is my code:
#include <stdio.h>
void main(void)
{
FILE *fp1;
char datfile[132];
int nod[1024 * 1024];
int Enod[8 * 1024 * 1024];
double nodS[1024 * 1024], nodF[1024 * 1024];
}
The default stack size on Windows using MS compilers is 1 MiB. You put on the stack several arrays that hold millions of integers and doubles.
You said you increased stack size to 4GB. In that case, the following becomes relevant:
The reserved memory size represents the total stack allocation in virtual memory. As such, the reserved size is limited to the virtual address range. The initially committed pages do not utilize physical memory until they are referenced; however, they do remove pages from the system total commit limit, which is the size of the page file plus the size of the physical memory. The system commits additional pages from the reserved stack memory as they are needed, until either the stack reaches the reserved size minus one page (which is used as a guard page to prevent stack overflow) or the system is so low on memory that the operation fails. (emphasis mine)
In addition, Intel's notes list
Given these definitions, the following lists the limits on 32-bit and 64-bit variants of Windows:
32-bit
Stack data - 1GB (the stack size is set by the linker, the default is 1MB. This can be increased using the Linker property System > Stack Reserve Size)
...
64-bit
Stack data - 1GB (the stack size is set by the linker, the default is 1MB. This can be increased using the Linker property System > Stack Reserve Size)
...
Note that the limit on static and stack data is the same in both 32-bit and 64-bit variants. This is due to the format of the Windows Portable Executable (PE) file type, which is used to describe EXEs and DLLs as laid out by the linker. It has 32-bit fields for image section offsets and lengths and was not extended for 64-bit variants of Windows. As on 32-bit Windows, static data and stack share the same first 2GB of address space. (emphasis mine)
Finally, look closely at the instruction in the beginning of your post:
But the error continues to stop at chkstk.asm at this line
99 sub eax, _PAGESIZE_ ; decrease by PAGESIZE
To look more closely at this, I wrote a small program which would generate the same effect as yours (it only "worked" with a 32-bit build—I am not sure what needs to be done to cause a 64-bit executable to crash):
#include <stdio.h>
#include <stdlib.h>
#define MYSIZE (8 * 1024 * 1024)
int main(void) {
int x[MYSIZE];
x[MYSIZE - 1] = rand();
printf("%d\n", x[MYSIZE - 1]);
return 0;
}
I set the stack size in the linker options to 4294967296 and then ran the program under the debugger. It crashed with a stack overflow, and broke at the same instruction you observed. Scrolling up in the stack checking code, I noted the following comments:
; Handle allocation size that results in wraparound.
; Wraparound will result in StackOverflow exception.
As far as I can figure out, the routine is trying to move the top-of-stack down by PAGESIZE at a time to reserve the applicable stack size.
So, trying to set the stack size to 4 GB seems to be the root cause of your immediate problem. You can try setting it to 1 GB which might solve that problem. Indeed, I changed the stack size for the program above to 1073741823 (= 1024 * 1024 * 1024 - 1), and I did not get a stack overflow. I think the fact that link does not at the very least warn about an invalid stack size value is a bug.
Indeed, looking at the hexdump of an executable built with /STACK:1000000000 and comparing it to one built with /STACK:4294967296 highlights the problem:
00000150: 0000 0000 0300 4081 00ca 9a3b 0010 0000 ......#....;....
00000150: 0000 0000 0300 4081 0000 0000 0010 0000 ......#.........
Note that 0x3b9aca00 is 1,000,000,000 in hex. Looking at the header format, those refer to the long SizeOfStackReserve; entry. That is, when you set stack size to 4 GB (actually, anything above 0xfffffffc), that has the effect of setting it to zero.
While setting a stack size to larger but still unsupported sizes does result in a positive value being set in the executable header, e.g.:
cl main.c /link /STACK:0xdeadbead
xxd main.exe |more
...
00000150: 0000 0000 0300 4081 b0be adde 0010 0000 ......#.........
...
the resulting executable cannot be run:
C:\...> main
Not enough storage is available to process this command.
However even though setting the stack to a smaller but still large size may enable your program to run, relying on a huge stack is not necessarily a good idea. The immediate alternative is to allocate those arrays on the heap using malloc and remembering to free them.
That is, instead of
int Enod[8 * 1024 * 1024];
you need declare int *Enod and then allocate memory for the array using
Enod = malloc(8 * sizeof(*Enod) * 1024 * 1024);
/* remember to check that Enod is not NULL */
Answers to this question discuss why the stack is much more constrained.
In addition, your code would benefit from replacing arbitrary looking numbers with meaningful mnemonics using defines.
Replace your on stack arrays with malloc'd array uses instead. Boom! Problem solved.
Of course you may still wind up running out of memory but at least you wouldn't have to be working as hard to avoid this and it'd be heap memory which IMO is easier to get more of .
You could also pre allocate some of those malloc'd arrays so the code wouldn't be doing as much work each function call.

If a pointer's address is 64 bits, why does it appear as 6 bytes when printed?

#include <stdio.h>
int main(void){
int *ptr;
printf("the value of ptr is %p",ptr);
}
This gives me 0x7fffbd8ce900, which is only 6 bytes. Should it be 8 bytes (64bit)?
Although a pointer is 64 bits, current processors actually only support 48 bits, so the upper two bytes of an address are always either 0000 or (due to sign-extension) FFFF.
In the future, if 48 bits is no longer enough, new processors can add support for 56-bit or 64-bit virtual addresses, and existing programs will be able to utilize the additional space since they're already using 64-bit pointers.
That just means the first two bytes are zero (which, incidentally, is currently guaranteed for x86-64 chips—but that doesn't mean anything in this case, since your pointer is not initialized). %p is allowed to truncate leading zeroes, just like any other numeric type. %016p, however, is not. This should work fine:
printf("the value of ptr is %016p", ptr);
Because the 6-bytes address is just the virtual address(offset of the actual physical address). In physical architecture(X86 for instance), memory is divided into portions that may be addressed by a single index register without changing a 16-bit segment selector. In real mode of X86-CPU, a segment is always using 16-bit(2-bytes) segment-selector, which will dynamically decided by the Operating-System at the very beginning when your program started to run(i.e. creating actual running process).
Hence, if your variable have the 48-bit address 0x7fffbd8ce900, and your program have the segment-selector offset 08af, and the real address of the variable is (0x08af<<48)+0x7fffbd8ce900 = 0x08af7fffbd8ce900, which is 64-bit.
further reading pls turn to:
x86 memory segmentation

Map memory to another address

X86-64, Linux, Windows.
Consider that I'd want to make some sort of "free launch for tag pointers". Basically I want to have two pointers that point to the same actual memory block but whose bits are different. (For example I want one bit to be used by GC collection or for some other reason).
intptr_t ptr = malloc()
intptr_t ptr2 = map(ptr | GC_FLAG_REACHABLE) //some magic call
int* p = int*(ptr);
int* p2 = int*(ptr2);
*p = 10;
*p2 = 20;
assert(*p == 20)
assert(p != p2)
On Linux, mmap() the same file twice. Same thing on Windows really, but it has its own set of functions for that.
Mapping the same memory (mmap on POSIX as Ignacio mentions, MapViewOfFile on Windows) to multiple virtual addresses may provide you some interesting coherency puzzles (are writes at one address visible when read at another address?). Or maybe not. I'm not sure what all the platform guarantees are.
More commonly, one simply reserves a few bits in the pointer and shifts things around as necessary.
If all your objects are aligned to 8-byte boundaries, it's common to simply store tags in the 3 least-significant bits of a pointer, and mask them off before dereferencing (as thkala mentions). If you choose a higher alignment, such as 16-bytes or 32-bytes, then there are 3 or 5 least-significant bits that can be used for tagging. Equivalently, choose a few most-significant bits for tagging, and shift them off before dereferencing. (Sometimes non-contiguous bits are used, for example when packing pointers into the signalling NaNs of IEEE-754 floats (223 values) or doubles (251 values).)
Continuing on the high end of the pointer, current implementations of x86-64 use at most 48 bits out of a 64-bit pointer (0x0000000000000000-0x00007fffffffffff + 0xffff800000000000-0xffffffffffffffff) and Linux and Windows only hand out addresses in the first range to userspace, leaving 17 most-significant bits that can be safely masked off. (This is neither portable nor guaranteed to remain true in the future, though.)
Another approach is to stop considering "pointers" and simply use indices into a larger memory array, as the JVM does with -XX:+UseCompressedOops. If you've allocated a 512MB pool and are storing 8-byte aligned objects, there are 226 possible object locations, so a 32-value has 6 bits to spare in addition to the index. A dereference will require adding the index times the alignment to the base address of the array, saved elsewhere (it's the same for every "pointer"). If you look at things carefully, this is simply a generalization of the previous technique (which always has base at 0, where things line up with real pointers).
Once upon a time I worked on a Prolog implementation that used the following technique to have spare bits in a pointer:
Allocate a memory area with a known alignment. malloc() usually allocates memory with a 4-byte or 8-byte alignment. If necessary, use posix_memalign() to get areas with a higher alignment size.
Since the resulting pointer is aligned to intervals of multiple bytes, but it represents byte-accurate addresses, you have a few spare bits that will by definition be zero in the memory area pointer. For example a 4-byte alignment gives you two spare bits on the LSB side of the pointer.
You OR (|) your flags with those bits and now have a tagged pointer.
As long as you take care to properly mask the pointer before using it for memory access, you should be perfectly fine.

Resources