Given the following C code:
void *ptr;
ptr = malloc(100);
printf("Address: %p\n", ptr);
When compiling this code using GCC 4.9 in Ubuntu 64 bit and running it the output is similar to this:
Address: 0x151ab10
The value 0x151ab10 seems a reasonable number since my machine has 8 GB of RAM, but when compiling the same code using GCC 4.9 in Mac OS X 64 bit and running it, it gives an output similar to this:
Address: 0x7fb9cb43ed30
... which is strange because 0x7fb9cb43ed30 is well above the 8 GB of RAM. Is there some kind of bit masking that one has to do in Mac OS X so that the real address of ptr can be printed out?
When processes run in general-purpose operating systems, the operating system constructs a “virtual” address space for each process, using assistance from hardware.
Whenever a process with a virtual address space accesses memory, the hardware translates the virtual address (in the process’ address space) to a physical address (in actual memory hardware), using special registers in the hardware and tables in system memory that describe how the translation should be done.1 The operating system configures the registers and tables for each process.
Commonly, the operating system, or the loader (the software that loads programs into memory for execution) assigns various ranges of the virtual address space for various purposes. It may put the stack in one place, executable code in another, general space for allocatable memory in another, and special system data in another. These addresses may come from base locations set arbitrarily by human designers or from various calculations, or combinations of those.
So seeing a large virtual address is not unusual; it is simply a location that was assigned in an imaginary address space.
Footnote
1 There are additional complications in translating virtual addresses to physical addresses. When the processor translates an address, the result may be that the desired location is not in physical memory at all. When this happens, the processor notifies the operating system. In response, the operating system can allocate some physical memory, read the necessary data from disk, update the memory map of the process so that the virtual address points to the newly allocated physical memory, and resume execution of the process. Then your process can continue as if the data were there all along. Additionally, when the system allocated physical memory, it may have had to make some memory available by writing data that was in memory to disk, and also removing it from the memory map of some process (possibly the same one). In this way, disk space becomes auxiliary memory, and processes can execute with more memory in their virtual address spaces than there is in actual physical memory.
Related
Consider a sample below.
char* p = (char*)malloc(4096);
p[0] = 'a';
p[1] = 'b';
The 4KB memory is allocated by calling malloc(). OS handles the memory request by the user program in user-space. First, OS requests memory allocation to RAM, then RAM gives physical memory address to OS. Once OS receives physical address, OS maps the physical address to virtual address then OS returns the virtual address which is the address of p to user program.
I wrote some value(a and b) in virtual address and they are really written into main memory(RAM). I'm confusing that I wrote some value in virtual address, not physical address, but it is really written to main memory(RAM) even though I didn't care about them.
What happens in behind? What OS does for me? I couldn't found relevant materials in some books(OS, system programming).
Could you give some explanation? (Please omit the contents about cache for easier understanding)
A detailed answer to your question will be very long - and too long to fit here at StackOverflow.
Here is a very simplified answer to a little part of your question.
You write:
I'm confusing that I wrote some value in virtual address, not physical address, but it is really written to main memory
Seems you have a very fundamental misunderstanding here.
There is no memory directly "behind" a virtual address. Whenever you access a virtual address in your program, it is automatically translated to a physical address and the physical address is then used for access in main memory.
The translation happens in HW, i.e. inside the processor in a block called "MMU - Memory management unit" (see https://en.wikipedia.org/wiki/Memory_management_unit).
The MMU holds a small but very fast look-up table that tells how a virtual address is to be translated into a physical address. The OS configures this table but after that, the translation happens without any SW being involved and - just to repeat - it happens whenever you access a virtual memory address.
The MMU also takes some kind of process ID as input in order to do the translation. This is need because two different processes may use the same virtual address but they will need translation to two different physical addresses.
As mentioned above the MMU look-up table (TLB) is small so the MMU can't hold a all translations for a complete system. When the MMU can't do a translation, it can make an exception of some kind so that some OS software can be triggered. The OS will then re-program the MMU so that the missing translation gets into the MMU and the process execution can continue. Note: Some processors can do this in HW, i.e. without involving the OS.
You have to understand that virtual memory is virtual, and it can be more extensive than physical memory RAM, so it is mapped differently. Although they are actually the same.
Your programs use virtual memory addresses, and it is your OS who decides to save in RAM. If it fills up, then it will use some space on the hard drive to continue working.
But the hard drive is slower than the RAM, that's why your OS uses an algorithm, which could be Round-Robin, to exchange pages of memory between the hard drive and RAM, depending on the work being done, ensuring that the data that are most likely to be used are in fast memory. To swap pages back and forth, the OS does not need to modify virtual memory addresses.
Summary overlooking a lot of things
You want to understand how virtual memory works. There's lots of online resources about this, here's one I found that seems to do a fair job of trying to explain it without getting too crazy in technical details, but also doesn't gloss over important terms.
https://searchstorage.techtarget.com/definition/virtual-memory
For Linux on x86 platforms, the assembly equivalent of asking for memory is basically a call into the kernel using int 0x80 with some parameters for the call set into some registers. The interrupt is set at boot by the OS to be able to answer for the request. It is set in the IDT.
An IDT descriptor for 32 bits systems looks like:
struct IDTDescr {
uint16_t offset_1; // offset bits 0..15
uint16_t selector; // a code segment selector in GDT or LDT
uint8_t zero; // unused, set to 0
uint8_t type_attr; // type and attributes, see below
uint16_t offset_2; // offset bits 16..31
};
The offset is the address of the entry point of the handler for that interrupt. So interrupt 0x80 has an entry in the IDT. This entry points to an address for the handler(also called ISR). When you call malloc(), the compiler will compile this code to a system call. The system call returns in some register the address of the allocated memory. I'm pretty sure as well that this system call will actually use the sysenter x86 instruction to switch into kernel mode. This instruction is used alongside an MSR register to securely jump into kernel mode from user mode at the address specified in the MSR (Model Specific Register).
Once in kernel mode, all instructions can be executed and access to all hardware is unlocked. To provide with the request the OS doesn't "ask RAM for memory". RAM isn't aware of what memory the OS uses. RAM just blindly answers to asserted pins on it's DIMM and stores information. The OS just checks at boot using the ACPI tables that were built by the BIOS to determine how much RAM there is and what are the different devices that are connected to the computer to avoid writing to some MMIO (Memory Mapped IO). Once the OS knows how much RAM is available (and what parts are usable), it will use algorithms to determine what parts of available RAM every process should get.
When you compile C code, the compiler (and linker) will determine the address of everything right at compilation time. When you launch that executable the OS is aware of all memory the process will use. So it will set up the page tables for that process accordingly. When you ask for memory dynamically using malloc(), the OS determines what part of physical memory your process should get and changes (during runtime) the page tables accordingly.
As to paging itself, you can always read some articles. A short version is the 32 bits paging. In 32 bits paging you have a CR3 register for each CPU core. This register contains the physical address of the bottom of the Page Global Directory. The PGD contains the physical addresses of the bottom of several Page Tables which themselves contain the physical addresses of the bottom of several physical pages (https://wiki.osdev.org/Paging). A virtual address is split into 3 parts. The 12 bits to the right (LSB) are the offset in the physical page. The 10 bits in the middle are the offset in the page table and the 10 MSB are the offset in the PGD.
So when you write
char* p = (char*)malloc(4096);
p[0] = 'a';
p[1] = 'b';
you create a pointer of type char* and making a system call to ask for 4096 bytes of memory. The OS puts the first address of that chunk of memory into a certain conventional register (which depends on the system and OS). You should not forget that the C language is just a convention. It is up to the operating system to implement that convention by writing a compatible compiler. It means that the compiler knows what register and what interrupt number to use (for the system call) because it was specifically written for that OS. The compiler will thus take the address stored into this certain register and store it into this pointer of type char* during runtime. On the second line you are telling the compiler that you want to take the char at the first address and make it an 'a'. On the third line you make the second char a 'b'. In the end, you could write an equivalent:
char* p = (char*)malloc(4096);
*p = 'a';
*(p + 1) = 'b';
The p is a variable containing an address. The + operation on a pointer increments this address by the size of what is stored in that pointer. In this case, the pointer points to a char so the + operation increments the pointer by one char (one byte). If it was pointing to an int then it would be incremented of 4 bytes (32 bits). The size of the actual pointer depends on the system. If you have a 32 bits system then the pointer is 32 bits wide (because it contains an address). On a 64 bits system the pointer is 64 bits wide. A static memory equivalent of what you did is
char p[4096];
p[0] = 'a';
p[1] = 'b';
Now the compiler will know at compile time what memory this table will get. It is static memory. Even then, p represents a pointer to the first char of that array. It means you could write
char p[4096];
*p = 'a';
*(p + 1) = 'b';
It would have the same result.
First, OS requests memory allocation to RAM,…
The OS does not have to request memory. It has access to all of memory the moment it boots. It keeps its own database of which parts of that memory are in use for what purposes. When it wants to provide memory for a user process, it uses its own database to find some memory that is available (or does things to stop using memory for other purposes and then make it available). Once it chooses the memory to use, it updates its database to record that it is in use.
… then RAM gives physical memory address to OS.
RAM does not give addresses to the OS except that, when starting, the OS may have to interrogate the hardware to see what physical memory is available in the system.
Once OS receives physical address, OS maps the physical address to virtual address…
Virtual memory mapping is usually described as mapping virtual addresses to physical addresses. The OS has a database of the virtual memory addresses in the user process, and it has a database of physical memory. When it is fulfilling a request from the process to provide virtual memory and it decides to back that virtual memory with physical memory, the OS will inform the hardware of what mapping it choose. This depends on the hardware, but a typical method is that the OS updates some page table entries that describe what virtual addresses get translated to what physical addresses.
I wrote some value(a and b) in virtual address and they are really written into main memory(RAM).
When your process writes to virtual memory that is mapped to physical memory, the processor will take the virtual memory address, look up the mapping information in the page table entries or other database, and replace the virtual memory address with a physical memory address. Then it will write the data to that physical memory.
In C, when you get the address of a variable is that address an address that really exist in the RAM of the computer or just an address in a fake memory in the C compiler (if that's how it really works)? Can you explain in layman’s terms?
Yes and no. When you take the address of a variable, and perform some operations on it (assuming the compiler doesn't optimize it out), it will correspond to an address in ram. However because of virtual memory, the address used in your program is almost certainly not the address of the variable in physical ram. The kernel remaps what virtual addresses (what your program sees) refer to which physical addresses (what the memory sees), so that different processes can be loaded into memory at the same time, yet not be able to access each others' memory. Additionally, your process's memory can be paged out, or written to disk if it has not been used recently and/or something else needs more memory, and reloaded into a completely different address, yet the virtual address will remain the same.
So yes, when you access a pointer, that address corresponds to an address in memory. But that address doesn't correspond to the actual address in ram, and the address it corresponds to can change over time.
The sort answer is "neither".
In general terms, the address of a variable in memory is in the context of a running program's address space.
What differs is how the program's address space is mapped to hardware by the host system.
With modern hardware that has a memory management unit (MMU), and operating systems (or their device drivers) that use the MMU, a program's address space is mapped to physical memory, which may consist of RAM or virtual memory, such as a swap file on a hard drive. The operating system uses the MMU to isolate programs from each other (so two processes cannot access each other's address space) and also uses the MMU to support swapping of data between RAM and swap. The running process cannot generally tell where its data is in physical memory, because the operating system and MMU specifically prevent it from doing so. Over time, the operating system and MMU may migrate memory used a program to different areas of RAM or to swap, but the program cannot detect this, since the operating system and MMU take care of mapping an address in the program (which never changes as far as the program is concerned) to the actual address. This covers most modern versions of windows, unix, and various realtime operating systems. (Those systems also typically provide means of programatically accessing physical memory, but only for programs that are running with higher privileges or for kernel mode drivers).
Older hardware did not have an MMU, so operating systems were not able to give programs separate address spaces. On such systems, the address as seen by a program had a one-to-one correspondence to a location in physical memory.
Somewhere in between was hardware that had separate areas of physical memory (e.g. provided by distinct banks of memory chips). On those systems, with support of special drivers, a host system could implement a partial mapping between addresses in a program's address space, and locations in particular areas of physical memory. This is why some target systems, and compilers that support them, support more than one pointer type (e.g. with names like near, far, and huge) as a compiler extension. In those cases, a pointer could refer to a location in a particular area of memory, and there may be some mapping of values, for each pointer type, from the value of a pointer seen by a program to the actual location within a corresponding area of physical memory.
The C compiler does not become a part of executable program it builds (otherwise, to install any built program, it would be necessary to also install and execute the compiler used to build it, or the program would not run). Typically, a compiler is no longer running when a program is executed (or, at least, a program cannot rely on it being present). A program therefore cannot access addresses within the compiler's address space.
In an interpreted environment (e.g. C code is interpreted by another program - the interpreter) the interpreter acts as an intermediary between the program and the hardware, and handles mapping between a program's address space, the interpreter's address space, and physical memory. C interpreters are relatively rare in practice, compared with toolchains that use compilers and linkers.
On ancient OSes, the MMU isn't present on the target processor, or not used (even if the processor allows it).
In that case, physical addresses are used, which is simpler to understand but also annoying because when you're debugging an assembly program or trying to decode a traceback, you have to know where the program was loaded or the post-mortem traceback is useless.
Without MMU, you can do very hacky & simple things. Shared memory can be coded in a few lines, you can inspect the whole memory very easily, etc...
On modern OSes, relying on MMU processor capability and address translation, executables are running in a virtual memory, which isn't an issue since they cannot access other executables memory anyway.
The good side is that if you're running/debugging the same executable many times, you always get the same addresses. Useful on long debugging sessions where you have to restart the debugger many times.
Also, some languages/compilers (like GNAT Ada compiler) provide a traceback with addresses when the program does something illegal. Using addr2line on the executable, you're able to get the exact traceback even after the process has ended and memory has been released.
The exception I know of is Windows shared libraries (DLL) which are almost never loaded at the same address, since this address is potentially shared between several executables. In those cases, for instance, a post-mortem traceback will be useless because the declared symbol address has an offset from the actual traceback address.
In case of multi-process environment where multiple processes runs at same time, linker can not decide address of the variables at compile time.
Reason is simple, if you assign dedicated address to the variables then you limit the number of processes that can run on your system.
So they assign a virtual address to the variables and those addresses translated to the physical addresses during run-time with the help of OS and processor.
One example of such system is linux running on x86 CPU.
In other cases where only one process/application runs on a processor then linker can assign actual physical address to variables.
example: embedded systems performing dedicated tasks, such as Oven.
It's documented that variables allocated on the heap are stored in the low address area and grow towards the stack and vice versa. I decided to test this out:
#include <stdio.h>
#include <stdlib.h>
const char my_const_global_var = '0';
char my_global_var = '0';
int main(void) {
char my_stack_var = '0';
char* my_heap_var = (char*) malloc(1);
*my_heap_var = '0';
}
It appears that my_const_global_var and my_global_var are addressed in the low address area (shortly after 000XXXXX and before the heap) but what surprises me is that my_stack_var is addressed exactly around the 75% mark (around bffbdaXX). I'm guessing I'll get a segfault when my global/heap/stack variables exceed 3 GB of memory, so I did a search and found mention of a 3 GB barrier but no mention of what happens in the remaining 1 GB of addressable space.
What happens in the remaining 25% of the memory address space?
Protected mode operating systems running on 32-bit x86 CPUs usually divide the 32-bit virtual address space into to two main regions. The first is for user processes and the second is for the kernel. The virtual address space doesn't address physical memory directly. Instead it's mapped through a page table maintained by the kernel to physical memory. This allows the operating system to give each process its own virtual address space, isolating them from each other. When it switches processes it changes page tables so that the user area of the virtual address space points to the physical memory locations used by the new process.
However when it switches processes, the operating system doesn't change page table entries that correspond to the kernel region. This means that while each process has its own physical memory mapped into the user region, the kernel region remains the same for each process. Mapping the kernel memory into the virtual address space of each process allows for much quicker transition from user mode to kernel mode when performing system calls. If the kernel wasn't mapped into every process, the operating system would essentially have to switch processes (to a hypothetical kernel "process") to perform a system call. Switching processes is a much more expensive operation than just transitioning from user mode to kernel mode.
Most 64-bit x86 operating systems also have similar split, but since they have much bigger a virtual address space they divide the virtual address space into much larger chunks. When running 32-bit programs these operating systems usually give the program access to all or nearly all of the first 4 GB of virtual address space.
Note that how virtual memory is split into user and kernel regions is unaffected by the amount of physical memory in the system. Unless explicitly configured differently, a machine with only 64 MB of RAM will have the same user/kernel virtual address space split as a machine with 64 GB of memory when running the exact same operating system.
I understand if I try to print the address of an element of an array it would be an address from virtual memory not from real memory (physical memory) i.e DRAM.
printf ("Address of A[5] and A[6] are %u and %u", &A[5], &A[6]);
I found addresses were consecutive (assuming elements are chars). In reality they may not be consecutive at least not in the DRAM. I want to know the real addresses. How do I get that?
I need to know this for either Windows or Linux.
You can't get the physical address for a virtual address from user code; only the lowest levels of the kernel deal with physical addresses, and you'd have to intercept things there.
Note that the physical address for a virtual address may not be constant while the program runs — the page might be paged out from one physical address and paged back in to a different physical address. And if you make a system call, this remapping could happen between the time when the kernel identifies the physical address and when the function call completes because the program requesting the information was unscheduled and partially paged out and then paged in again.
The simple answer is that, in general, for user processes or threads in a multiprocessing OS such as Windows or Linux, it is not possible to find the address even of of a static variable in the processor's memory address space, let alone the DRAM address.
There are a number of reasons for this:
The memory allocated to a process is virtual memory. The OS can remap this process memory from time-to-time from one physical address range to another, and there is no way to detect this remaping in the user process. That is, the physical address of a variable can change during the lifetime of a process.
There is no interface from userspace to kernel space that would allow a userspace process to walk through the kernel's process table and page cache in order to find the physical address of the process. In Linux you can write a kernel module or driver that can do this.
The DRAM is often mapped to the procesor address space through a memory management unit (MMU) and memory cache. Although the MMU maping of DRAM to the processor address space is usually done only once, during system boot, the processor's use of the cache can mean that values written to a variable might not be written through to the DRAM in all cases.
There are OS-specific ways to "pin" a block of allocated memory to a static physical location. This is often done by device drivers that use DMA. However, this requires a level of privilege not available to userspace processes, and, even if you have the physical address of such a block, there is no pragma or directive in the commonly used linkers that you could use to allocate the BSS for a process at such a physical address.
Even inside the Linux kernel, virtual to physical address translation is not possible in the general case, and requires knowledge about the means that were used to allocate the memory to which a particular virtual address refers.
Here is a link to an article called Translating Virtual to Physical Address on Windows: Physical Addresses that gives you a hint as to the extreme ends to which you must go to get physical addresses on Windows.
I mean the physical memory, the RAM.
In C you can access any memory address, so how does the operating system then prevent your program from changing memory address which is not in your program's memory space?
Does it set specific memory adresses as begin and end for each program, if so how does it know how much is needed.
Your operating system kernel works closely with memory management (MMU) hardware, when the hardware and OS both support this, to make it impossible to access memory you have been disallowed access to.
Generally speaking, this also means the addresses you access are not physical addresses but rather are virtual addresses, and hardware performs the appropriate translation in order to perform the access.
This is what is called a memory protection. It may be implemented using different methods. I'd recommend you start with a Wikipedia article on this subject — http://en.wikipedia.org/wiki/Memory_protection
Actually, your program is allocated virtual memory, and that's what you work with. The OS gives you a part of the RAM, you can't access other processes' memory (unless it's shared memory, look it up).
It depends on the architecture, on some it's not even possible to prevent a program from crashing the system, but generally the platform provides some means to protect memory and separate address space of different processes.
This has to do with a thing called 'paging', which is provided by the CPU itself. In old operating systems, you had 'real mode', where you could directly access memory addresses. In contrast, paging gives you 'virtual memory', so that you are not accessing the raw memory itself, but rather, what appears to your program to be the entire memory map.
The operating system does "memory management" often coupled with TLB's (Translation Lookaside Buffers) and Virtual Memory, which translate any address to pages, which the operation system can tag readable or executable in the current processes context.
The minimum requirement for a processors MMU or memory management unit is in current context restrict the accessable memory to a range which can be only set in processors registers in supervisor mode (as opposed to user mode).
The logical address is generated by the CPU which is mapped to the physical address by the memory mapping unit. Unlike the physical address space the logical address is not restricted by memory size and you just get to work with the logical address space. The address binding is done by MMU. So you never deal with the physical address directly.
Most computers (and all PCs since the 386) have something called the Memory Management Unit (or MMU). It's job is to translate local addresses used by a program into the physical addresses needed to fetch real bytes from real memory. It's the operating system's job to program the MMU.
As a result of this, programs can be loaded into any region of memory and appear, from that program's point of view while executing, to be be any any other address. It's common to find that the code for all programs appear (locally) to be at the same address and their data always appears (locally) to be at the same address even though physically they will be in different locations. With each memory access, the MMU transparently translates from the local address space to the physical one.
If a program trys to access a memory address that has not been mapped into its local address space, the hardware generates an exception and typically gets flagged as a "segmentation violation", followed by the forcible termination of the program. This protects from accessing the memory of other processes.
But that doesn't have to be the case! On systems with "virtual memory" and current resource demands on RAM that exceed the amount of physical memory, some pages (just blocks of memory of a common size, often on the order of 4-8kB) can be written out to disk and given as RAM to a program trying to allocate and use new memory. Later on, when that page is needed by whatever program owns it, the memory access causes an exception and the OS swaps out some other memory page and re-loads the needed one from disk. The program that "page-faulted" gets delayed while this happens but otherwise notices nothing.
There are lots of other tricks the MMU/OS can do as well, like sharing memory between processes, making a disk file appear to be direct-memory-accessible, setting some pages as "NX" so they can't be treated as executable code, using arbitrary sections of the logical memory space regardless of how much and at what address the physical ram uses, and more.