Consider a sample below.
char* p = (char*)malloc(4096);
p[0] = 'a';
p[1] = 'b';
The 4KB memory is allocated by calling malloc(). OS handles the memory request by the user program in user-space. First, OS requests memory allocation to RAM, then RAM gives physical memory address to OS. Once OS receives physical address, OS maps the physical address to virtual address then OS returns the virtual address which is the address of p to user program.
I wrote some value(a and b) in virtual address and they are really written into main memory(RAM). I'm confusing that I wrote some value in virtual address, not physical address, but it is really written to main memory(RAM) even though I didn't care about them.
What happens in behind? What OS does for me? I couldn't found relevant materials in some books(OS, system programming).
Could you give some explanation? (Please omit the contents about cache for easier understanding)
A detailed answer to your question will be very long - and too long to fit here at StackOverflow.
Here is a very simplified answer to a little part of your question.
You write:
I'm confusing that I wrote some value in virtual address, not physical address, but it is really written to main memory
Seems you have a very fundamental misunderstanding here.
There is no memory directly "behind" a virtual address. Whenever you access a virtual address in your program, it is automatically translated to a physical address and the physical address is then used for access in main memory.
The translation happens in HW, i.e. inside the processor in a block called "MMU - Memory management unit" (see https://en.wikipedia.org/wiki/Memory_management_unit).
The MMU holds a small but very fast look-up table that tells how a virtual address is to be translated into a physical address. The OS configures this table but after that, the translation happens without any SW being involved and - just to repeat - it happens whenever you access a virtual memory address.
The MMU also takes some kind of process ID as input in order to do the translation. This is need because two different processes may use the same virtual address but they will need translation to two different physical addresses.
As mentioned above the MMU look-up table (TLB) is small so the MMU can't hold a all translations for a complete system. When the MMU can't do a translation, it can make an exception of some kind so that some OS software can be triggered. The OS will then re-program the MMU so that the missing translation gets into the MMU and the process execution can continue. Note: Some processors can do this in HW, i.e. without involving the OS.
You have to understand that virtual memory is virtual, and it can be more extensive than physical memory RAM, so it is mapped differently. Although they are actually the same.
Your programs use virtual memory addresses, and it is your OS who decides to save in RAM. If it fills up, then it will use some space on the hard drive to continue working.
But the hard drive is slower than the RAM, that's why your OS uses an algorithm, which could be Round-Robin, to exchange pages of memory between the hard drive and RAM, depending on the work being done, ensuring that the data that are most likely to be used are in fast memory. To swap pages back and forth, the OS does not need to modify virtual memory addresses.
Summary overlooking a lot of things
You want to understand how virtual memory works. There's lots of online resources about this, here's one I found that seems to do a fair job of trying to explain it without getting too crazy in technical details, but also doesn't gloss over important terms.
https://searchstorage.techtarget.com/definition/virtual-memory
For Linux on x86 platforms, the assembly equivalent of asking for memory is basically a call into the kernel using int 0x80 with some parameters for the call set into some registers. The interrupt is set at boot by the OS to be able to answer for the request. It is set in the IDT.
An IDT descriptor for 32 bits systems looks like:
struct IDTDescr {
uint16_t offset_1; // offset bits 0..15
uint16_t selector; // a code segment selector in GDT or LDT
uint8_t zero; // unused, set to 0
uint8_t type_attr; // type and attributes, see below
uint16_t offset_2; // offset bits 16..31
};
The offset is the address of the entry point of the handler for that interrupt. So interrupt 0x80 has an entry in the IDT. This entry points to an address for the handler(also called ISR). When you call malloc(), the compiler will compile this code to a system call. The system call returns in some register the address of the allocated memory. I'm pretty sure as well that this system call will actually use the sysenter x86 instruction to switch into kernel mode. This instruction is used alongside an MSR register to securely jump into kernel mode from user mode at the address specified in the MSR (Model Specific Register).
Once in kernel mode, all instructions can be executed and access to all hardware is unlocked. To provide with the request the OS doesn't "ask RAM for memory". RAM isn't aware of what memory the OS uses. RAM just blindly answers to asserted pins on it's DIMM and stores information. The OS just checks at boot using the ACPI tables that were built by the BIOS to determine how much RAM there is and what are the different devices that are connected to the computer to avoid writing to some MMIO (Memory Mapped IO). Once the OS knows how much RAM is available (and what parts are usable), it will use algorithms to determine what parts of available RAM every process should get.
When you compile C code, the compiler (and linker) will determine the address of everything right at compilation time. When you launch that executable the OS is aware of all memory the process will use. So it will set up the page tables for that process accordingly. When you ask for memory dynamically using malloc(), the OS determines what part of physical memory your process should get and changes (during runtime) the page tables accordingly.
As to paging itself, you can always read some articles. A short version is the 32 bits paging. In 32 bits paging you have a CR3 register for each CPU core. This register contains the physical address of the bottom of the Page Global Directory. The PGD contains the physical addresses of the bottom of several Page Tables which themselves contain the physical addresses of the bottom of several physical pages (https://wiki.osdev.org/Paging). A virtual address is split into 3 parts. The 12 bits to the right (LSB) are the offset in the physical page. The 10 bits in the middle are the offset in the page table and the 10 MSB are the offset in the PGD.
So when you write
char* p = (char*)malloc(4096);
p[0] = 'a';
p[1] = 'b';
you create a pointer of type char* and making a system call to ask for 4096 bytes of memory. The OS puts the first address of that chunk of memory into a certain conventional register (which depends on the system and OS). You should not forget that the C language is just a convention. It is up to the operating system to implement that convention by writing a compatible compiler. It means that the compiler knows what register and what interrupt number to use (for the system call) because it was specifically written for that OS. The compiler will thus take the address stored into this certain register and store it into this pointer of type char* during runtime. On the second line you are telling the compiler that you want to take the char at the first address and make it an 'a'. On the third line you make the second char a 'b'. In the end, you could write an equivalent:
char* p = (char*)malloc(4096);
*p = 'a';
*(p + 1) = 'b';
The p is a variable containing an address. The + operation on a pointer increments this address by the size of what is stored in that pointer. In this case, the pointer points to a char so the + operation increments the pointer by one char (one byte). If it was pointing to an int then it would be incremented of 4 bytes (32 bits). The size of the actual pointer depends on the system. If you have a 32 bits system then the pointer is 32 bits wide (because it contains an address). On a 64 bits system the pointer is 64 bits wide. A static memory equivalent of what you did is
char p[4096];
p[0] = 'a';
p[1] = 'b';
Now the compiler will know at compile time what memory this table will get. It is static memory. Even then, p represents a pointer to the first char of that array. It means you could write
char p[4096];
*p = 'a';
*(p + 1) = 'b';
It would have the same result.
First, OS requests memory allocation to RAM,…
The OS does not have to request memory. It has access to all of memory the moment it boots. It keeps its own database of which parts of that memory are in use for what purposes. When it wants to provide memory for a user process, it uses its own database to find some memory that is available (or does things to stop using memory for other purposes and then make it available). Once it chooses the memory to use, it updates its database to record that it is in use.
… then RAM gives physical memory address to OS.
RAM does not give addresses to the OS except that, when starting, the OS may have to interrogate the hardware to see what physical memory is available in the system.
Once OS receives physical address, OS maps the physical address to virtual address…
Virtual memory mapping is usually described as mapping virtual addresses to physical addresses. The OS has a database of the virtual memory addresses in the user process, and it has a database of physical memory. When it is fulfilling a request from the process to provide virtual memory and it decides to back that virtual memory with physical memory, the OS will inform the hardware of what mapping it choose. This depends on the hardware, but a typical method is that the OS updates some page table entries that describe what virtual addresses get translated to what physical addresses.
I wrote some value(a and b) in virtual address and they are really written into main memory(RAM).
When your process writes to virtual memory that is mapped to physical memory, the processor will take the virtual memory address, look up the mapping information in the page table entries or other database, and replace the virtual memory address with a physical memory address. Then it will write the data to that physical memory.
Related
WARNING: This is long but I hope it can be useful for people like me in the future.
I think I know what program counter is, how lazy memory allocation works, what MMU does, how virtual memory address is mapped to physical address and the purpose of L1, L2 caches. What I really have trouble with is is how they all fit together in a high level when we run a C code.
Suppose I have this C code:
#include <stdio.h>
#include <stdlib.h>
int main()
{
int* ptr;
int n = 1000000, i = 0;
// Dynamically allocate memory using malloc()
ptr = (int*)malloc(n * sizeof(int));
ptr[0] = 99;
i += 100;
printf("%d\n", ptr[0]);
free(ptr);
return 0;
}
So here is my attempt to put everything together:
After execve() is called, part of the executable is loaded into the memory, e.g. text and data segment, but most of the code are not - they are loaded on demand (demand paging).
The address of the first instruction is in the process table's program counter (PC) field as well as physically in the PC register, ready to be used.
As the CPU executes instructions, PC is updated (usually +1, but jump can go to a different address).
Enter the main function: ptr, n, and i are in the stack.
Next, when we call malloc, the C library will ask the OS (I think via sbrk() sys call, or is it mmap()?) to allocate some memory on the heap.
malloc succeeds in this case, returning a virtual memory address (VMA), but the physically memory may not have been allocated yet. The page table doesn't contain the VMA, so when CPU tries to access such VMA, a page fault will be generated.
In our case, when we do ptr[0] = 99, CPU raises a page fault. I am not sure if the entire array is allocated or just the first page (4k size) though.
But now I don't know how to put cache access into the picture. How does i put into L1 cache? How does it relate to VMA?
Sorry if this is confusing. I just hope someone could help walk me through the entire process...
Before the program runs, the operating system and the C runtime setup the necessary values in the CPU registers.
As you've already noted, the intended PC value is set by the operating system (e.g. by the loader) and then the CPU's PC (aka IP) register is set, probably with a "return from interrupt" instruction that both switches to user mode (activating the virtual memory map for that process) along with loading the CPU with the proper PC value (a virtual address).
In addition, the SP register is set somehow: in some systems this will be done similar to the PC during the "return from interrupt", but in other (older) systems the user code sets the SP to a prearranged location. In either case the SP also holds a virtual memory address.
Usually the first instruction in that runs in the user process is in a routine traditionally called _start in a library called crt0 (C RunTime 0 (aka startup)). _start is usually written in assembly and handles the transition from the operating system to user mode. As needed _start will establish anything else necessary for C code to be called, and then, call main. If main returns to _start, it will do an exit syscall.
The CPU caches (and probably TLBs) will be cold when _start's first instruction gets control. All addresses in user mode are virtual memory addresses that designate memory within the (virtual) address space of the process. The processor is running in user mode. Probably the operating system has preloaded the page holding _start (or a least the start of _start). So when the processor performs an instruction fetch from _start, it will probably TLB miss, but not page fault, and then cache miss.
The TLB is a set of registers forming a cache in the CPU that support virtual to physical address translations/mappings. The TLB, when it misses, will be loaded from a structure in the virtual memory mapping for the process, such as the page tables. Since that first page is preloaded, the attempt to map will succeed, and the TLB will then be filled with the proper mappings from the virtual PC page to the physical page. However, the L1/L2, etc.. caches are also cold, so the access next causes a cache miss. The memory system will satisfy the cache miss by filling a cache line at each level. Finally an instruction word or group of words is provided to the processor, and it begins executing instructions.
If a virtual address for code (by way of the PC) or data (by some dereference) is not present in the TLB, then the processor will consult the page tables, and a miss there can cause a recoverable or non-recoverable page fault. Recoverable page faults are virtual to physical mappings that are not present in the page tables, because the data is on disc and operating system intervention is required; whereas non-recoverable faults are accesses to virtual memory that are in error, i.e. not allowed as they refer to virtual memory that has not been allocated/authorized by the operating system.
Variable i is known to main as a stack-relative location. So, when main wants to write to i it will write to memory and an offset from SP, e.g. SP+8 (i could also be a register variable, but I digress). Since the SP is a pointer holding a virtual memory address, i then has a virtual address. That virtual address goes thru the above described steps: TLB mapping from virtual page to physical page, possible page faulting, and then possible cache miss. Subsequent access will yield TLB hits, and cache hits, so as to run at full speed. (The operating system will probably also preload some but not all stack pages before running the process.)
A malloc operation will use some system calls that ultimately cause additional virtual memory to be added to the process. (Though as you also note, malloc gets more than enough for the current request so the system calls are not done every malloc.) malloc will return a virtual memory address, i.e. a pointer in the user mode virtual address space. For memory just obtained by a system call, the TLB and caches are also probably code, and it is possible that the page is not even loaded yet as well. In the latter case, a recoverable page fault will happen and the OS will allocate a physical page to use. If the OS is smart it will know that this is a new data page, and so can fill it with zeros instead of loading it from the paging file. Then it will set up the page table entries for the proper mapping, and resume the user process, which will probably then TLB miss, fill a TLB entry from the page tables, and then cache miss, and fill cache lines from the physical page.
Given the following C code:
void *ptr;
ptr = malloc(100);
printf("Address: %p\n", ptr);
When compiling this code using GCC 4.9 in Ubuntu 64 bit and running it the output is similar to this:
Address: 0x151ab10
The value 0x151ab10 seems a reasonable number since my machine has 8 GB of RAM, but when compiling the same code using GCC 4.9 in Mac OS X 64 bit and running it, it gives an output similar to this:
Address: 0x7fb9cb43ed30
... which is strange because 0x7fb9cb43ed30 is well above the 8 GB of RAM. Is there some kind of bit masking that one has to do in Mac OS X so that the real address of ptr can be printed out?
When processes run in general-purpose operating systems, the operating system constructs a “virtual” address space for each process, using assistance from hardware.
Whenever a process with a virtual address space accesses memory, the hardware translates the virtual address (in the process’ address space) to a physical address (in actual memory hardware), using special registers in the hardware and tables in system memory that describe how the translation should be done.1 The operating system configures the registers and tables for each process.
Commonly, the operating system, or the loader (the software that loads programs into memory for execution) assigns various ranges of the virtual address space for various purposes. It may put the stack in one place, executable code in another, general space for allocatable memory in another, and special system data in another. These addresses may come from base locations set arbitrarily by human designers or from various calculations, or combinations of those.
So seeing a large virtual address is not unusual; it is simply a location that was assigned in an imaginary address space.
Footnote
1 There are additional complications in translating virtual addresses to physical addresses. When the processor translates an address, the result may be that the desired location is not in physical memory at all. When this happens, the processor notifies the operating system. In response, the operating system can allocate some physical memory, read the necessary data from disk, update the memory map of the process so that the virtual address points to the newly allocated physical memory, and resume execution of the process. Then your process can continue as if the data were there all along. Additionally, when the system allocated physical memory, it may have had to make some memory available by writing data that was in memory to disk, and also removing it from the memory map of some process (possibly the same one). In this way, disk space becomes auxiliary memory, and processes can execute with more memory in their virtual address spaces than there is in actual physical memory.
I'd like to know why something like the C code below works in microcontrollers but not in mainstream computers:
// 1. Get memory location of GPIOA SET Register.
uint32_t *gpioa = (uint32_t *)(0x40020000 + 0x18);
// 2. Set bit to 1 to enable it.
*gpioa |= (1<<5);
Statement 1. works on computers, but trying to access the memory location in any way leads to a segmentation fault.
Is the operating system blocking direct memory access in this way?
Yes, on typical multi-user systems, the operating system controls access to memory.
Your process has only a virtual address space. The operating sets special registers or other features in the hardware to regulate your address space. Parts of your virtual address space are mapped to physical memory, and parts are not mapped at all. (A mapping specifies how a virtual address is translated to a physical address.) The operating system also determines whether you can read memory, write memory, or execute instructions from memory.
At times, the operating system may change what parts of memory your process can access. It may keep data your process is not currently using on disk and mark that part of your virtual address space inaccessible. When your process tries to access it, the hardware generates an exception, and the kernel handles the exception by reading the data from disk to memory, marking the memory accessible to your process, and restarting your process at the instruction that generated the exception.
To add to Eric Postpischil's answer, the memory locations of various microcontroller registers are different for every model of microntroller you may try to program. So not only is your code not portable to a PC (where it segfaults), but it may also segfault or misbehave on a different microcontroller (unless it's the same family of microntroller, and they're specifically designed to be compatible).
I am working on Video HAL Application & there I am getting Camera frame CallBack from HAL Layer. During programming I found that memcpy copying data from physical address gets crashed while it is ok by copying data from virtual address. I searched for such a information about memcpy but found no where & even not on its man page.
so, my question is does memcpy required physical address or virtual address? Anywhere mentioned this type of information about memcpy?
memcpy is implemented in C or optimized in assembler. As such, it doesn't care about what type of address it gets. It just loads the addresses in the CPU registers and executes mov instructions.
It is the operating system and memory hardware architecture that are responsible for mapping any logical (virtual) address to a physical address.
Note also that with modern OS/memory architectures, each process gets its own address space. Passing addresses between address spaces will not work.
In these cases, the OS will probably provide functionality to exchange memory objects (shared or otherwise) between processes.
As Paul Ogilvie correctly explained, memcpy deals with user space addresses. As such they are virtual addresses, not necessarily physical addresses.
Yet there is a possibility for very large areas with very specific alignment characteristics to optimize memcpy by requesting the OS to remap some of the destination virtual addresses as duplicates of the physical memory mapped to the source addresses. These pages would get the COPY_ON_WRITE attribute to ensure that the program only modifies pages in the appropriate array, if and when it writes to either one. The generic implementation of memcpy in the GlibC does this (see glibc-2.3.5/sysdeps/generic/memcpy.c). But this is transparent for the programmer who still provides addresses in user space.
I understand if I try to print the address of an element of an array it would be an address from virtual memory not from real memory (physical memory) i.e DRAM.
printf ("Address of A[5] and A[6] are %u and %u", &A[5], &A[6]);
I found addresses were consecutive (assuming elements are chars). In reality they may not be consecutive at least not in the DRAM. I want to know the real addresses. How do I get that?
I need to know this for either Windows or Linux.
You can't get the physical address for a virtual address from user code; only the lowest levels of the kernel deal with physical addresses, and you'd have to intercept things there.
Note that the physical address for a virtual address may not be constant while the program runs — the page might be paged out from one physical address and paged back in to a different physical address. And if you make a system call, this remapping could happen between the time when the kernel identifies the physical address and when the function call completes because the program requesting the information was unscheduled and partially paged out and then paged in again.
The simple answer is that, in general, for user processes or threads in a multiprocessing OS such as Windows or Linux, it is not possible to find the address even of of a static variable in the processor's memory address space, let alone the DRAM address.
There are a number of reasons for this:
The memory allocated to a process is virtual memory. The OS can remap this process memory from time-to-time from one physical address range to another, and there is no way to detect this remaping in the user process. That is, the physical address of a variable can change during the lifetime of a process.
There is no interface from userspace to kernel space that would allow a userspace process to walk through the kernel's process table and page cache in order to find the physical address of the process. In Linux you can write a kernel module or driver that can do this.
The DRAM is often mapped to the procesor address space through a memory management unit (MMU) and memory cache. Although the MMU maping of DRAM to the processor address space is usually done only once, during system boot, the processor's use of the cache can mean that values written to a variable might not be written through to the DRAM in all cases.
There are OS-specific ways to "pin" a block of allocated memory to a static physical location. This is often done by device drivers that use DMA. However, this requires a level of privilege not available to userspace processes, and, even if you have the physical address of such a block, there is no pragma or directive in the commonly used linkers that you could use to allocate the BSS for a process at such a physical address.
Even inside the Linux kernel, virtual to physical address translation is not possible in the general case, and requires knowledge about the means that were used to allocate the memory to which a particular virtual address refers.
Here is a link to an article called Translating Virtual to Physical Address on Windows: Physical Addresses that gives you a hint as to the extreme ends to which you must go to get physical addresses on Windows.