How large is the virtual address space of a program? - c

I was reading Operating Systems: Three Easy Pieces. To learn how the virtual address space for a program look like, I run the following code.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
printf("location of code : %p\n", (void *) main);
printf("location of heap : %p\n", (void *) malloc(1));
int x = 3;
printf("location of stack : %p\n", (void *) &x);
return x;
}
Its output is:
location of code : 0x564eac1266fa
location of heap : 0x564ead8e5670
location of stack : 0x7fffd0e77e54
Why the code segment's location is 0x564eac1266fa? What does so large a (virtual) space before it use for? Why doesn't it start from or near 0x0)
And, why the program's virtual address is so large?(from the stack location, it's 48 bits wide) What's the point of it?

The possible virtual address space organizations are defined by the hardware you are using, specifically the MMU it supports. The OS may then use any organization that the hardware can be coerced into using, but generally it just uses it directly (possibly with some subsetting), as that is most efficient.
The x86_64 architecture defines a 48-bit virtual address space1, and most OSes reserve half of that for system use, so user programs see a 47 bit address space. Within that address space, most OSes will randomize the addresses used for any given program, so as to make exploiting bugs in the programs harder.
1Strictly speaking, the architecture defines a 64-bit virtual address space, but then reserves all addresses that do not have the top 17 bits all 0 or all 1.

You are barking up the wrong tree with what you are trying to do here. A process has multiple stacks, may have multiple heaps, and main might not be the start of the code. Viewing an address space as a code segment, stack segment, heap segment, ... as horrible operating systems books do is only going to get you confused.
Because of logical addressing, the memory mapped into the address space does not have to be contiguous.
Why the code segment's location is 0x564eac1266fa? What does so large a (virtual) space before it use for? Why doesn't it start from or near 0x0)
The start of code in your process would well be at 0x564eac1266f8. The fact that you have a high address does not mean the lower addresses have been mapped into the process address space.
And, why the program's virtual address is so large?(from the stack location, it's 48 bits wide) What's the point of it?
Stacks generally start high and grow low.

Related

Memory Layout of Linux (malloc() used in C, but does not start with the expected address)

I used this chunk of code:
int
main(int argc, char *argv[])
{
int *p; // memory for pointer is on "stack"
p = malloc(sizeof(int)); // malloc'd memory is on "heap"
assert(p != NULL);
printf("(pid:%d) addr of p: %llx\n", (int) getpid(),
(unsigned long long) &p);
printf("(pid:%d) addr stored in p: %llx\n", (int) getpid(),
(unsigned long long) p);
return 0;
}
However, I get:
addr of p: 7ffc0c53e3e0
addr stored in p: 558ae195c260
Now, first of all, since the program only does this, I do not understand why malloc() does not start with the address 00200000? Second, can I say that the 7ffc0c53e3e0 address is in heap, and the address 558ae195c260 is in stack? Third, if my guess with 00200000 is wrong, is there any logic with the addresses that I get, or is it completely random?
When I think about it, the address is not even 32 bits, it is 48 bits. Even if it is to be more than 32 bits (I have 8 GB memory, so I believe it must be more than 32 anyways), why is it not expressed in 64 bits, since the processor is 64 bits.
Thank you for your help.
Linux implements ASLR, so as far as I'm aware you always get random addresses. 558ae195c260 is actually allocated on the heap via malloc(), whereas 7ffc0c53e3e0 is allocated on the stack when you declare int *p;. 48 bits is still enough for 256 TB of RAM, but beyond that some architectures don't allow all address lines to be a full 64 bits (like AMD64).
Hope that helps. If anything I've said is wrong or misleading please correct me in the comments.
No, p is in stack (or global) and it points to a bunch of memory in the heap.
Regarding malloc() , if you are working on an OS, it depends on the kernel, and how it is managing the memory.
Finally, obviously some of your data is wrong. A 32bit bus data cannot manage 8GB of RAM (no more than 2^32=4GB). However, it makes sense in a 64bits bus, because a 64bits variable has enough space to contain an 8GB address.

C heap address changes between runs while other addresses persist

The heap troubles me because I don't understand who creates it, who maintains it and who decides where it should be... This test shows part of my conundrum:
Source code:
#include <malloc.h>
#include <stdio.h>
int a;
int b = 5;
int * getMeAPointer() {
int * e = malloc(4);
*e = 5;
return e;
}
void main() {
a = 5;
int c = 5;
int * d = (int *) 0x405554;
*d = 5;
int * e = getMeAPointer();
printf("Address of a located in .bss is %x\n", &a);
printf("Address of b located in .data is %x\n", &b);
printf("Address of c located in stack is %x\n", &c);
printf("Address of d located in stack is %x\n", &d);
printf("Address of *d located absolutely is %x\n", d);
printf("Address of e located in stack is %x\n", &e);
printf("Address of *e located on heap is %x\n", e);
printf("Address of getMeAPointer() located in .text is %x\n", getMeAPointer);
free(e);
}
Example printouts:
Address of a located in .bss is 0x405068
Address of b located in .data is 0x402000
Address of c located in stack is 0x22ff1c
Address of d located in stack is 0x22ff18
Address of *d located absolutely is 0x405554
Address of e located in stack is 0x22ff14
Address of *e located on heap is 0x541738
Address of getMeAPointer() located in .text is 0x4013b0
Address of a located in .bss is 0x405068
Address of b located in .data is 0x402000
Address of c located in stack is 0x22ff1c
Address of d located in stack is 0x22ff18
Address of *d located absolutely is 0x405554
Address of e located in stack is 0x22ff14
Address of *e located on heap is 0x3a1738
Address of getMeAPointer() located in .text is 0x4013b0
Address of a located in .bss is 0x405068
Address of b located in .data is 0x402000
Address of c located in stack is 0x22ff1c
Address of d located in stack is 0x22ff18
Address of *d located absolutely is 0x405554
Address of e located in stack is 0x22ff14
Address of *e located on heap is 0x351738
Address of getMeAPointer() located in .text is 0x4013b0
....etc....
Now these are my concerns:
Why is the heap moving around and none of the other segments? This is on a Windows 7 OS with MinGW and this file was compiled with GCC without further flags (I don't believe this is an example of Address Space Layout Randomization).
Who decides where the heap should be? I belive the linker reserves a place for the heap (I've seen heap symbols in symbol tables) but when is the exact address decided on, is it a runtime thing done by the RUNNABLE itself (C code) AFTER loading, or is it done by the linker / loader / dynamic linker while LOADING the program right BEFORE execution?
Is there any way to set the heap address in ld? I have understood that I can set all segments except stack (since that's built into the kernel of the OS) but can I set the heap address?
The way I understands it, the heap is not really an assembly language construct and we don't have access to a heap if we choose to just do assembly programming. Therefore it is a C construct but I'm interested in how that affects the life of the heap (i mean we speak of the heap like it's on the same level as the segments and the stack but if it isn't, then that should give it a whole lot of other conditions)... is this correct and can anyone tell me something more about it?
I've Google'd all day long to be honest and I'm hungry for some answers!
Why is the heap moving around and none of the other segments?
Because dynamic memory allocation is, well, dynamic. The address you get back from malloc() depends on where a sufficiently big chunk of free memory can be found at the very moment your program is being executed. Obviously, since there are other programs too, this changes over time.
Who decides where the heap should be?
The developers of your operating system.
I belive [sic!] the linker reserves a place for the heap
Rarely. On most implementations I've seen, it's entirely a runtime thing. (It's not that it's impossible that the linker have something to do with it, but still.)
Is there any way to set the heap address in ld?
If there is, it's surely documented. (Assuming that the ld you are referring to is the linker in your toolchain.)
[...] I can set all segments except stack (since that's built into the kernel of the OS)
I don't think I fully understand what you are saying, but "the stack" is not "built in to the kernel". Generally, the stack address is either statically hard-coded into the executable or it is referred to using relative instructions.
the heap is not really an assembly language construct and we don't have access to a heap if we choose to just do assembly programming.
But yes you do. I'm not familiar with Windows, but on most Unixes, you can use the brk() and/or sbrk() syscalls.
Therefore it is a C construct
Your logic is flawed. Just because something is not an assembly thing, it does not automatically mean that it's a C thing. In fact, there's no such thing as "the heap" or "the stack" in C. In C, there is only automatic, static and dynamic storage duration, which are not tied to specific manners of implementation.
A variety of factors will influence where the heap will reside, i.e.
Base address of the application
Libraries required by the application
What the operating system allocates to the application
The C heap is (simplified) just a huge block of memory which is maintained by the runtime of the application, so the effective address you get when calling malloc() is defined by that runtime. The runtime is different between compiler versions and different vendors. The entire memory block that the runtime uses as a heap is obtained from the operating system at application startup. Here, the operating system may return a different address each time the application is run. Therefore, heap addresses are not predictable. If your application starts allocating and deallocating memory during its run, it will even get "more" random, because now the runtime needs to find free blocks between currently allocated blocks and so on. So unless the sequence of allocations/deallocations is exactly the same between runs, you will get entirely different addresses, even if the base heap address is the same.

Initializing variable at address zero in C

This may be a pretty basic question. I understand that there is a C convention to set the value of null pointers to zero. Is it possible that you can ever allocate space for a new variable in Windows, and the address of that allocated space happens to be zero? If not, what usually occupies that address region?
On MS-DOS the null pointer is a fairly valid pointer and due to the OS running in real mode it was actually possible to overwrite the 0x0 address with garbage and corrupt the kernel. You could do something like:
int i;
unsigned char* ptr = (unsigned char *)0x0;
for(i = 0; i < 1024; i++)
ptr[i] = 0x0;
Modern operating systems (e.g. Linux, Windows) run in protected mode which never gives you direct access to physical memory.
The processor will map the physical addresses to virtual addresses that your program will make use of.
It also keeps track of what you access and dare you touch something not belonging to you will you be in trouble (your program will segfault). This most definitely includes trying to dereference the 0x0 address.
When you "set the value of a pointer to zero" as in
int *p = 0;
it will not necessarily end up pointing to physical address zero, as you seem to believe. When a pointer is assigned a constant zero value (or initialized with it), the compiler is required to recognize that situation and treat it in a special way. The compiler is required to replace that zero with implementation-dependent null-pointer value. The latter does not necessarily point to zero address.
Null pointer value is supposed to be represented by a physical address that won't be used for any other purpose. If in some implementation physical address zero is a usable address, then such implementation will have to use a different physical address to represent null pointers. For example, some implementation might use address 0xFFFFFFFF for that purpose. In such implementation the initialization
int *p = 0;
will actually initialize p with physical 0xFFFFFFFF, not with physical zero.
P.S. You might want to take a look at the FAQ: http://c-faq.com/null/index.html, which is mostly dedicated to exactly that issue.
The value 0 has no special meaning. It is a convention to set a pointer to 0 and the C compiler has to interpret it accordingly. However, there is no connection to the physical address 0 and in fact, that address can be a valid address. In many systems though the lower adresses are containing hardware related adresses, like interrupt vectors or other. On the Amiga for example, the address 4 was the entry point into the operating system, which is also an arbitrary decision.
If the address of allocated space is zero, there is insufficient memory available. That means your variable could not be allocated.
The address at 0x0 is where the CPU starts executing when you power it on. Usually at this address there's a jump to the BIOS code and IIRC the first 64K (or more) are reserved for other tasks (determined by the BIOS/UEFI). It's an area which is not accessbile by an application.
Given that it should be clear that you cannot have a variable at address 0x0 in Windows.

Storing a number in a given hex location in C

Let's assume that there is a function store_at(int) which is supposed to store the passed number in a given hexa location as shown below:
void store_at(int val)
{
int *ptr;
ptr = (int *)0x261;
// logic goes here
return;
}
How do we write logic to store val at the given Hex location (0x261 In this case)?
Does saying *ptr = val; work? I vaguely remember reading somewhere that this is not allowed in C.
*ptr = val; works. But you have to make sure this address is allocated and even more, accessible. Without knowing for what you are programming C, I could suggest some ways of prevention on accessing addresses you don't have permission. So it pretty much depends on the architecture and/or operational system you're using.
For example, in ATMEGA32 microcontroller, you don't have any limitation regarding the access of the main memory for it. You can read, write and execute code from/for it:
PORTB = 1;
// Knowing that PORTB is stored at 0x38, you can do the equivalent:
*((unsigned int *)0x0038) = 1;
But that's on embedded systems. Now if you want total access for a memory space (as long as it's in your application sandbox), you can use VirtualProtect for Windows and mprotect for Linux:
int val = 123;
DWORD oldprotection;
VirtualProtect((LPVOID)0x261, sizeof(int), PAGE_EXECUTE_READWRITE, &oldprotection);
*(int *)0x261 = val;
And here is the types of protection you can use with it: Memory Protection Constants.
And a mprotect example:
int val = 123;
mprotect((const void *)(((int)(0x261) / PAGESIZE) * PAGESIZE), sizeof(int), PROT_WRITE | PROT_READ | PROT_EXEC);
*(int *)0x261 = val;
Note that this mprotect example is untested, you may need to increase the size for protection or some other things.
The division by PAGESIZE there is just a trick to align the address correctly. Also note that your address is invalid for Linux, as its division will lead to 0 if PAGESIZE is greater than it (the same as "it will be").
According to the syntax for accessing a address using a pointer, all of these work:
*(int *)0x261 = val;
int *ptr = (int *)0x261;
*ptr = val;
Yes, expression *ptr = val (and even more, *(int *)0x261 = val; ) is perfectly valid in C. But then you're facing technical limitations of runtime environments.
Modern operating systems usually run processes in a sandbox of virtual memory (so processes can't access and spoil memory of some other process) and technically the virtual memory of a process looks like a set of regions which you can access, some in readonly way, some does not allow executing code from here and so on. When you try to access non-available VM region, you'll get SIGSEGV on Unix-like systems or Access Violation error on Windows systems, the same for writing to a read-only memory region and trying to execute code in region where it's prohibited by operating system (for example, you can see virtual memory mappings for a linux process with pid in /proc/$PID/maps.
Memory of a process is usually managed by the operating system (you get new memory from the heap using OS-provided functions like malloc(), calloc(); the stack memory regions are allocated by the OS at process startup), so in user-space programming you virtually never need to reference data by literal pointer.
Another possible environments are kernel-space or bare-metal C programs, where you have all the physical memory available to you, but still you must be aware of what you accessing (it may be ports, a gap in the physical memory, it may be reserved by hardware and so on). Programming such environments is an advanced topic and needs good C experience.

What does a pointer value mean?

Consider the really simple code below:
#include <stdio.h>
#include <stdlib.h>
int main() {
int* a = (int*) malloc(10 * sizeof(int));
printf("a = %p, a+1 = %p", a, a+1);
return 0;
}
The output is this:
a = 0x127f190, a+1 = 0x127f194
Since the size of an int is 4 bytes, I am assuming from the result above that a pointer value is then the index of a byte on my RAM memory. Hence a+1 increases in fact the value of a by sizeof(int) = 4 (bytes). Is that correct?
If yes, then why do I get 32 bit memory addresses from my program? This machine is 64bit running a 64bit version of Ubuntu. How do I get the program to print a full 64bit address? Do I have to compile it with special flags?
You are correct about your pointer. Memory is traditionally organized and addressed in bytes, pointers point to the first byte of whatever they are pointing to (EDIT: they don't HAVE to, but on the usual platforms and compilers they do).
You are not seeing a "64 bit" pointer simply because the output strips the leading 0s :-) If you do a sizeof(a), chances are that you'll get "4" on a 32 bit system, and "8" on a 64 bit system.
the index of a byte on my RAM memory
That's close enough for most purposes, but not entirely accurate. It's the index of a byte in your process's address space. Virtual addresses do not relate directly to physical RAM, there's a thing called a "virtual memory manager" that's responsible for keeping track of what virtual addresses in each process refer to what physical RAM (and things other than RAM).
Normally you can forget about this, and just think of the virtual address space as RAM (or as "memory" to keep it abstract). But the same virtual address in different processes could refer to different physical memory, or the same memory could be referred to in different processes by different virtual addresses. Or the same virtual address in the same process could refer to different physical memory at different times, if the OS has noticed that the page hasn't been used for a while, swapped it to disk and then back to memory when it's used again. So it's not really "the" address of the RAM itself, it's just the address that your process has been given by the OS, to refer to some RAM.
The reason you're seeing an increase of four is because you're allocating memory for integers, which are fixed at four bytes long (in Intel Linux gcc) -- whether you've compiled 32bit or 64bit code. As already said, the pointers you get refer to virtual memory addresses, not physical memory.
If you change your int to long, you'll see a 4 byte increase with 32bit code and an 8 byte increase with 64bit code.
Additionally, if you look at sizeof(void *) it will tell you if your pointers are 32bit or 64bit. If your pointers are 64bit, then you'll be getting 64bit pointers printed with %p.
I edited your program to run on my copy of Ubuntu, adding:
printf("Size of pointer = %d\n", (int)sizeof(void *));
Here's the output:
a = 0x2067010, a+1 = 0x2067014
Size of pointer = 8
So the pointer is indeed 64bit.
Sorry, to not point you an exact answer. But here it comes:
Look for Pointer Arithmetic, you will find all you are looking for.
Even if you system runs x64, most of the compilers come default with x86 unless you specifically declare to compile for x64. So search compiler documentation for x64 flag, and for relevant options.
When you wrote int* a = (int*) malloc(10 * sizeof(int));, you were allocating memory for an array of ten elements. That is equivalent to int a[10]; and a = 0x127f190 is the address of the first element a[0].

Resources