What exactly is "memory" in C Programming? - c

I'm curious to know what "Memory" really stands for.
When I compile and execute this code:
#include <stdio.h>
int main(void)
{
int n = 50;
printf("%p\n", &n);
}
As we know, we get a Hex output like:
0x7ffeee63dabc
What does that Hex address physically stand for? Is it a part of my computer's L1 Cache? RAM? SSD?
Where can I read more about this, any references would be helpful. Thank you.
Some Background:
I've recently picked up learning Computer Science again after a break of a few years (I was working in the Industry as a low-code / no-code Web Developer) and realised there's a few gaps in my knowledge I want to colour in.
In learning C (via CS50x) I'm on the week of Memory. And I realise I don't actually know what Memory this is referring to. The course either assumes that the students already know this, or that it isn't pertinent to the context of this course (it's an Intro course so abstractions make sense to avoid going down rabbit holes), but I'm curious and I'd like to chase it down to find out the answers.

computer architecture 101
In your computer there is a CPU chip and there are RAM chips.
The CPU's job is to calculate things. The RAM's job is to remember things.
The CPU is in charge. When it wants to remember something, or look up something it's remembering, it asks the RAM.
The RAM has a bunch of slots where it can store things. Each slot holds 1 byte. The slot number (not the number in the slot, but the number of the slot) is called an address. Each slot has a different address. They start from 0 and go up: 0, 1, 2, 3, 4, ... Like letterboxes on a street, but starting from 0.
The way the CPU tells the RAM which thing to remember is by using a number called an address.
The CPU can say: "Put the number 126 into slot 73224." And it can say, "Which number is in slot 97221?"
We normally write slot numbers (addresses) in hexadecimal, with 0x in front to remind us that they're hexadecimal. It's tradition.
How does the CPU know which address it wants to access? Simple: the program tells it.
operating systems 101
An operating system's job is to keep the system running smoothly
That doesn't happen when faulty programs are allowed to access memory that doesn't belong to them.
So the operating system decides which memory the program is allowed to access, and which memory it isn't. It tells the CPU this information.
The "Am I allowed to access this memory?" information applies in 4 kilobyte chunks called "pages". Either you can access the entire page, or none of it. That's because if every byte had separate access information, you'd need to waste half your RAM just storing the access information!
If you try to access an address in a page that the OS said you can't access, the CPU narcs to the OS, which then stops running your program.
operating systems 102
Remember this shiny new "virtual memory" feature from the Windows 95 days?
"Virtual memory" means the addresses your program uses aren't the real RAM addresses.
Whenever you access an address, the CPU looks up the real address. This also uses pages. So the OS can make any "address page" go to any "real page".
These are not official terms - OS designers actually say that any "virtual page" can "map" to any "physical page".
If the OS wants a physical page but there aren't any left, it can pick one that's already used, save its data onto the disk, make a little note that it's on disk, and then it can reuse the page.
What if the program tries to access a page that's on disk? The OS lies to the CPU: it says "The program is not allowed to access this page." even though it is allowed.
When the CPU narcs to the OS, the OS doesn't stop the program. It pauses the program, finds something else to store on disk to make room, reads in the data for the page the program wants, then it unpauses the program and tells the CPU "actually, he's allowed to access this page now." Neat trick!
So that's virtual memory. The CPU doesn't know the difference between a page that's on disk, and one that's not allocated. Your program doesn't know the difference between a page that's on disk, and one that isn't. Your program just suffers a little hiccup when it has to get something from disk.
The only way to know whether a virtual page is actually stored in RAM (in a physical page), or whether it's on disk, is to ask the OS.
Virtual page numbers don't have to start from 0; the OS can choose any virtual page number it wants.
computer architecture 102
A cache is a little bit of memory in the CPU so it doesn't have to keep asking the RAM chip for things.
The first time the CPU wants to read from a certain address, it asks the RAM chip. Then, it chooses something to delete from its cache, deletes it, and puts the value it just read into the cache instead.
Whenever the CPU wants to read from a certain address, it checks if it's in the cache first.
Things that are in the cache are also in RAM. It's not one or the other.
The cache typically stores chunks of 64 bytes, called cache lines. Not pages!
There isn't a good way to know whether a cache line is stored in the cache or not. Even the OS doesn't know.
programming language design 101
C doesn't want you to know about all this stuff.
C is a set of rules about how you can and can't write programs. The people who design C don't want to have to explain all this stuff, so they make rules about what you can and can't do with pointers, and that's the end of it.
For example, the language doesn't know about virtual memory, because not all types of computers have virtual memory. Dishwashers or microwaves have no use for it and it would be a waste of money.
What does that Hex address physically stand for? Is it a part of my computer's L1 Cache? RAM? SSD?
The address 0x7ffeee63dabc means address 0xabc within virtual page 0x7ffeee63d. It might be on your SSD at the moment or in RAM; if you access it then it has to come into RAM. It might also be in cache at the moment but there's no good way to tell. The address doesn't change no matter where it goes to.

You should think of memory as an abstract mapping from addresses to values, nothing more.
Whether your actual hardware implements it as a single chunk of memory, or a complicated hierarchy of caches is not relevant, until you try to optimize for a very specific hardware, which you will not want to do 99% of the time.

In general memory is anything that is stored either temporarily or non-volatile. Temporary memory is lost when the machine is turned off and usually referred as RAM or simply "memory". Non volatile is kept in a hard disk, flash drive, EEPROM, etc. and usually referred as ROM or storage.
Caches are also a type of temporary memory, but they are referred just as cache and is not considered part of the RAM. The RAM on your PC is also referred as "physical memory" or "main memory".
When programming, all the variables are usually in main memory (more on this later) and brought to the caches (L1, L2, etc) when they are being used. But the caches are for the most part transparent for the app developers.
Now there is another thing to mention before I answer your question. The addresses of a program are not necessarily the addresses of the physical memory. The addresses are translated from "virtual addresses" to "physical addresses" by an MMU (memory protection unit) or similar CPU feature. The OS handles the MMU. The MMU is used for many reasons, two reasons are to hide and secure the OS memory and other apps memory from wrong memory accesses by a program. This way a program cannot access nor alter the OS or other program's memory.
Further, when there is not enough RAM to store all the memory that apps are requesting, the OS can store some of that memory in non volatile storage. Using virtual addresses, a program cannot easily know if the memory is actually in RAM or storage. This way programs can allocate a lot more memory than there is RAM. This is also why programs become very slow when they are consuming a lot of memory: it takes a long time to bring the data from storage back into main memory.
So, the address that you are printing is most likely the virtual address.
You can read something about those topics here:
https://en.wikipedia.org/wiki/Memory_management_(operating_systems)
https://en.wikipedia.org/wiki/Virtual_memory

Memory from the C standard point of view is the objects storage. How does it work and how it is organized is left to the implementation.
Even printing the pointers from the C point of view is pointless (it can be informative and interesting from the implementation point of view) and meaningless.

If your code is running under a modern operating system1, pointer values almost certainly correspond to virtual memory addresses, not physical addresses. There's a virtual memory system that maps the virtual address your code sees to a physical address in main memory (RAM), but as pages get swapped in and out that physical address may change.
For desktops, anything newer than the mid-'90s. For mainframes and minis, almost anything newer than the mid-'60s.

Is it a part of my computer's L1 Cache? RAM? SSD?
The short answer is RAM. This address is usually associated with a unique location inside your RAM. The long answer is, well - it depends!
Most machines today have a Memory Management Unit (MMU) which sits in-between the CPU and the peripherals attached to it, translating 'virtual' addresses seen by a program to real ones that actually refer to something physically attached to the bus. Setting up the MMU and allotting memory regions to your program is generally the job of the Operating System. This allows for cool stuff like sharing code/data with other running programs and more.
So the address that you see here may not be the actual physical address of a RAM location at all. However, with the help of the MMU, the OS can accurately and quickly map this number to an actual physical memory location somewhere in the RAM and allow you to store data in RAM.
Now, any accesses to the RAM may be cached in one or more of the available caches. Alternatively, it could happen that your program memory temporarily gets moved to disk (swapfile) to make space for another program. But all of this is completely automatic and transparent to the programmer. As far as your program is concerned, you are directly reading from or writing to the available RAM and the address is your handle to the unique location in RAM that you are accessing.

Related

Why do we need address virtualization in an operating system?

I am currently taking a course in Operating Systems and I came across address virtualization. I will give a brief about what I know and follow that with my question.
Basically, the CPU(modern microprocessors) generates virtual addresses and then an MMU(memory management unit) takes care of translating those virtual address to their corresponding physical addresses in the RAM. The example that was given by the professor is there is a need for virtualization because say for example: You compile a C program. You run it. And then you compile another C program. You try to run it but the resident running program in memory prevents loading a newer program even when space is available.
From my understanding, I think having no virtualization, if the compiler generates two physical addresses that are the same, the second won't run because it thinks there isn't enough space for it. When we virtualize this, as in the CPU generates only virtual addresses, the MMU will deal with this "collision" and find a spot for the other program in RAM.(Our professor gave the example of the MMU being a mapping table, that takes a virtual address and maps it to a physical address). I thought of that idea to be very similar to say resolving collisions in a hash table.
Could I please get some input on my understanding and any further clarification is appreciated.
Could I please get some input on my understanding and any further clarification is appreciated.
Your understanding is roughly correct.
Clarifications:
The data structures are nothing like a hash table.
If anything, the data structures are closer to a BTree, but even there are important differences with that as well. It is really closest to a (Java) N-dimensional array which has been sparsely allocated.
It is mapping pages rather than complete virtual / physical addresses. (A complete address is a page address + an offset within the page.).
There is no issue with collision. At any point in time, the virtual -> physical mappings for all users / processes give a one-to-one mapping from (process id + virtual page) to a either a physical RAM page or a disk page (or both).
The reasons we use virtual memory are:
process isolation; i.e. one process can't see or interfere with another processes memory
simplifying application writing; i.e. each process thinks it has a contiguous set off memory addresses, and the same set each time. (To a first approximation ...)
simplifying compilation, linking, loading; i.e. the compilers, etc there is no need to "relocate" code at compile time or run time to take into account other.
to allow the system to accommodate more processes than it has physical RAM for ... though this comes with potential risks and performance penalties.
I think you have a fundamental misconception about what goes on in an operating system in regard to memory.
(1) You are describing logical memory, not virtual memory. Virtual memory refers to the use of disk storage to simulate memory. Unmapped pages of logical memory get mapped to disk space.
Sadly, the terms logical memory and virtual memory get conflated but they are distinct concepts the the distinction is becoming increasingly important.
(2) Programs run in a PROCESS. A process only runs one program at a time (in unix each process generally only runs one program (two if you count the cloned caller) in its life.
In modern systems each process process gets a logical address space (sequential addresses) that can be mapped to physical locations or no location at all. Generally, part of that logical address space is mapped to a kernel area that is shared by all processes. The logical address space is create with the process. No address space—no process.
In a 32-bit system, addresses 0-7FFFFFFF might be user address that are (generally) mapped to unique physical locations while 80000000-FFFFFFFFmight be mapped to a system address space that is the same for all processes.
(3) Logical memory management primarily serves as a means of security; not as a means for program loading (although it does help in that regard).
(4) This example makes no sense to me:
You compile a C program. You run it. And then you compile another C program. You try to run it but the resident running program in memory prevents loading a newer program even when space is available.
You are ignoring the concept of a PROCESS. A process can only have one program running at a time. In systems that do permit serial running of programs with the same process (e.g., VMS) the executing program prevents loading another program (or the loading of another program causes the termination of the running program). It is not a memory issue.
(5) This is not correct at all:
From my understanding, I think having no virtualization, if the compiler generates two physical addresses that are the same, the second won't run because it thinks there isn't enough space for it. When we virtualize this, as in the CPU generates only virtual addresses, the MMU will deal with this "collision" and find a spot for the other program in RAM.
The MMU does not deal with collisions. The operating system sets up tables that define the logical address space when the process start. Logical memory has nothing to do with hash tables.
When a program accesses logical memory the rough sequence is:
Break down the address into a page and an offset within the page.
Does the page have an corresponding entry in the page table? If not FAULT.
Is the entry in the page table valid? If not FAULT.
Does the page table entry allow the type of access (read/write/execute) requested in the current operating mode (kernel/user/...)? If not FAULT.
Does the entry map to a physical page? If not PAGE FAULT (go load the page from disk--virtual memory––and try again).
Access the physical memory referenced by the page table.

Does Virtual Memory area struct only comes into picture when there is a page fault?

Virtual Memory is a quite complex topic for me. I am trying to understand it. Here is my understanding for a 32-bit system. Example RAM is just 2GB. I have tried reading many links, and I am not confident at the moment. I would like you people to help me in clearing up my concepts. Please acknowledge my points, and also please answer for what you feel is wrong. I have also a confused section in my points. So, here starts the summary.
Every process thinks it is only running. It can access the 4GB of memory - virtual address space.
When a process access a virtual address it is translated to physical address via MMU.
This MMU is a part of a CPU - a hardware.
When the MMU cannot translate the address to a physical one, it raises a page fault.
On page fault, the kernel is notified. The kernel check the VM area struct. If it can find it - may be on disk. It will do some page-in /page-out. And get this memory on the RAM.
Now MMU will again try and will succeed this time.
In case the kernel cannot find the address, it will raise a signal. For example, invalid access will raise a SIGSEGV.
Confused points.
Does Page table is maintained in Kernel? This VM area struct has a page table ?
How MMU cannot find the address in physical RAM. Let's say it translates to some wrong address in RAM. Still the code will execute, but it will be a bad address. How MMU ensures that it is reading a right data? Does it consult Kernel VM area everytime?
Is the Mapping table - virtual to physical is inside a MMU. I have read it that is maintained by an individual process. If it is inside a process, why I can't see it.
Or if it is MMU, how MMU generates the address - is it that Segment + 12-bit shift -> Page frame number, and then the addition of offset (bits -1 to 10) -> gives a physical address.
Does it mean that for a 32-bit architecture, with this calculation in my mind. I can determine the physical address from a virtual address.
cat /proc/pid_value/maps. This shows me the current mapping of the vmarea. Basically, it reads the Vmarea struct and prints it. That means that this is important. I am not able to fit this piece in the complete picture. When the program is executed does the vmarea struct is generated. Is VMAREA comes only into the picture when the MMU cannnot translate the address i.e. Page fault? When I print the vmarea it displays the address range , permission and mapped to file descriptor, and offset. I am sure this file descriptor is the one in the hard-disk and the offset is for that file.
The high-mem concept is that kernel cannot directly access the Memory region greater than 1 GB(approx). Thus, it needs a page table to indirectly map it. Thus, it will temporarily load some page table to map the address. Does HIGH MEM will come into the picture everytime. Because Userspace can directly translate the address via MMU. On what scenario, does kernel really want to access the High MEM. I believe the kernel drivers will mostly be using kmalloc. This is a direct memory + offset address. In this case no mapping is really required. So, the question is on what scenario a kernel needs to access the High Mem.
Does the processor specifically comes with the MMU support. Those who doesn't have MMU support cannot run LInux?
Does Page table is maintained in Kernel? This VM area struct has a page table ?
Yes. Not exactly: each process has a mm_struct, which contains a list of vm_area_struct's (which represent abstract, processor-independent memory regions, aka mappings), and a field called pgd, which is a pointer to the processor-specific page table (which contains the current state of each page: valid, readable, writable, dirty, ...).
The page table doesn't need to be complete, the OS can generate each part of it from the VMAs.
How MMU cannot find the address in physical RAM. Let's say it translates to some wrong address in RAM. Still the code will execute, but it will be a bad address. How MMU ensures that it is reading a right data? Does it consult Kernel VM area everytime?
The translation fails, e.g. because the page was marked as invalid, or a write access was attempted against a readonly page.
Is the Mapping table - virtual to physical is inside a MMU. I have read it that is maintained by an individual process. If it is inside a process, why I can't see it.
Or if it is MMU, how MMU generates the address - is it that Segment + 12-bit shift -> Page frame number, and then the addition of offset (bits -1 to 10) -> gives a physical address.
Does it mean that for a 32-bit architecture, with this calculation in my mind. I can determine the physical address from a virtual address.
There are two kinds of MMUs in common use. One of them only has a TLB (Translation Lookaside Buffer), which is a cache of the page table. When the TLB doesn't have a translation for an attempted access, a TLB miss is generated, the OS does a page table walk, and puts the translation in the TLB.
The other kind of MMU does the page table walk in hardware.
In any case, the OS maintains a page table per process, this maps Virtual Page Numbers to Physical Frame Numbers. This mapping can change at any moment, when a page is paged-in, the physical frame it is mapped to depends on the availability of free memory.
cat /proc/pid_value/maps. This shows me the current mapping of the vmarea. Basically, it reads the Vmarea struct and prints it. That means that this is important. I am not able to fit this piece in the complete picture. When the program is executed does the vmarea struct is generated. Is VMAREA comes only into the picture when the MMU cannnot translate the address i.e. Page fault? When I print the vmarea it displays the address range , permission and mapped to file descriptor, and offset. I am sure this file descriptor is the one in the hard-disk and the offset is for that file.
To a first approximation, yes. Beyond that, there are many reasons why the kernel may decide to fiddle with a process' memory, e.g: if there is memory pressure it may decide to page out some rarely used pages from some random process. User space can also manipulate the mappings via mmap(), execve() and other system calls.
The high-mem concept is that kernel cannot directly access the Memory region greater than 1 GB(approx). Thus, it needs a page table to indirectly map it. Thus, it will temporarily load some page table to map the address. Does HIGH MEM will come into the picture everytime. Because Userspace can directly translate the address via MMU. On what scenario, does kernel really want to access the High MEM. I believe the kernel drivers will mostly be using kmalloc. This is a direct memory + offset address. In this case no mapping is really required. So, the question is on what scenario a kernel needs to access the High Mem.
Totally unrelated to the other questions. In summary, high memory is a hack to be able to access lots of memory in a limited address space computer.
Basically, the kernel has a limited address space reserved to it (on x86, a typical user/kernel split is 3Gb/1Gb [processes can run in user space or kernel space. A process runs in kernel space when a syscall is invoked. To avoid having to switch the page table on every context-switch, on x86 typically the address space is split between user-space and kernel-space]). So the kernel can directly access up to ~1Gb of memory. To access more physical memory, there is some indirection involved, which is what high memory is all about.
Does the processor specifically comes with the MMU support. Those who doesn't have MMU support cannot run Linux?
Laptop/desktop processors come with an MMU. x86 supports paging since the 386.
Linux, specially the variant called µCLinux, supports processors without MMUs (!MMU). Many embedded systems (ADSL routers, ...) use processors without an MMU. There are some important restrictions, among them:
Some syscalls don't work at all: e.g fork().
Some syscalls work with restrictions and non-POSIX conforming behavior: e.g mmap()
The executable file format is different: e.g bFLT or ELF-FDPIC instead of ELF.
The stack cannot grow, and its size has to be set at link-time.
When a program is loaded first the kernel will setup a kernel VM-Area for that process is it? This Kernel VM Area actually holds where the program sections are there in the memory/HDD. Then the entire story of updating CR3 register, and page walkthrough or TLB comes into the picture right? So, whenever there is a pagefault - Kernel will update the page table by looking at Kernel virtual memory area is it? But they say Kernel VM area keeps updating. How this is possible, since cat /proc/pid_value/map will keep updating.The map won't be constant from start to end. SO, the real information is available in the Kernel VM area struct is it? This is the acutal information where the section of program lies, it could be HDD or physical memory -- RAM? So, this is filled during process loading is it, the first job? Kernel does the page in page out on page fault, and will update the Kernel VM area is it? So, it should also know the entire program location on the HDD for page-in / page out right? Please correct me here. This is in continuation to my first question of the previous comment.
When the kernel loads a program, it will setup several VMAs (mappings), according to the segments in the executable file (which on ELF files you can see with readelf --segments), which will be text/code segment, data segment, etc... During the lifetime of the program, additional mappings may be created by the dynamic/runtime linkers, by the memory allocator (malloc(), which may also extend the data segment via brk()), or directly by the program via mmap(),shm_open(), etc..
The VMAs contain the necessary information to generate the page table, e.g. they tell whether that memory is backed by a file or by swap (anonymous memory). So, yes, the kernel will update the page table by looking at the VMAs. The kernel will page in memory in response to page faults, and will page out memory in response to memory pressure.
Using x86 no PAE as an example:
On x86 with no PAE, a linear address can be split into 3 parts: the top 10 bits point to an entry in the page directory, the middle 10 bits point to an entry in the page table pointed to by the aforementioned page directory entry. The page table entry may contain a valid physical frame number: the top 22 bits of a physical address. The bottom 12 bits of the virtual address is an offset into the page that goes untranslated into the physical address.
Each time the kernel schedules a different process, the CR3 register is written to with a pointer to the page directory for the current process. Then, each time a memory access is made, the MMU tries to look for a translation cached in the TLB, if it doesn't find one, it looks for one doing a page table walk starting from CR3. If it still doesn't find one, a GPF fault is raised, the CPU switches to Ring 0 (kernel mode), and the kernel tries to find one in the VMAs.
Also, I believe this reading from CR, page directory->page-table->Page frame number-memory address this all done by MMU. Am I correct?
On x86, yes, the MMU does the page table walk. On other systems (e.g: MIPS), the MMU is little more than the TLB, and on TLB miss exceptions the kernel does the page table walk by software.
Though this is not going to be the best answer, iw ould like to share my thoughts on confused points.
1. Does Page table is maintained...
Yes. kernel maintains the page tables. In fact it maintains nested page tables. And top of the page tables is stored in top_pmd. pmd i suppose it is page mapping directory. You can traverse through all the page tables using this structure.
2. How MMU cannot find the address in physical RAM.....
I am not sure i understood the question. But in case because of some problem, the instruction is faulted or out of its instruction area is being accessed, you generally get undefined instruction exception resulting in undefined exception abort. If you look at the crash dumps, you can see it in the kernel log.
3. Is the Mapping table - virtual to physical is inside a MMU...
Yes. MMU is SW+HW. HW is like TLB and all. The mapping tables are stored here. For instructions, that is for code section i always converted the physical-virtual address and always they matched. And almost all the times it matches for Data sections as well.
4. cat /proc/pid_value/maps. This shows me the current mapping of the vmarea....
This is more used for analyzing the virtual addresses of user space stacks. As you know virtually all the user space programs can have 4 GB of virtual address. So unlike kernel if i say 0xc0100234. You cannot directly go and point to the istruction. So you need this mapping and the virtual address to point the instruction based on the data you have.
5. The high-mem concept is that kernel cannot directly access the Memory...
High-mem corresponds to user space memory(some one correct me if i am wrong). When kernel wants to read some data from a address at user space you will be accessing the HIGHMEM.
6. Does the processor specifically comes with the MMU support. Those who doesn't have MMU support cannot run LInux?
MMU as i mentioned is HW + SW. So mostly it would be coming with the chipset. and the SW would be generally architecture dependent. You can disable MMU from kernel config and build. I have never tried it though. Mostly these days allthe chipsets have it. But small boards i think they disable MMU. I am not entirely sure though.
As all these are conceptual questions, i may be lacking some knowledge and be wrong at places. If so others please correct me.

Is the Kernel Virtual Memory struct first formed when the process is about to execute?

I have been bothering with similar questions indirectly on my other posts. Now, my understanding is better. Thus, my questions are better. So, I want to summarize the facts here. This example is based on X86-32-bit system.
Please say yes/no to my points. If no, then please explain.
MMU will look into the CR3 register to find the Process - Page Directory base address.
The CR3 register is set by the kernel.
Now MMU after reading the Page directory base address, will offset to the Page Table index (calculated from VA), from here it will read the Page frame number, now it will find the offset on the page frame number based on the VA given. It gets the physical memory address. All this is done in MMU right? Don't know when MMU is disabled, who will do all this circus? If software then it will be slow right?
I know then page fault occurs when the MMU cannot resolve the address. The kernel is informed. The kernel will update the page table based on the reading from kernel virtual memory area struct. Am I correct?
Keeping in mind, the point 4. Does it mean that before executing any process. Perhaps during loading process. Does Kernel first fills the kernel virtual memory area struct. For example, where the section of memory will be BSS, Code, DS,etc. It could be that some sections are in RAM, and some are in Storage device. When the sections of the program is moved from storage to main memory, I am assuming that kernel would be updating the Kernel virtual memory area struct. Am I correct here? So, it is the kernel who keeps a close track on the program location - whether in storage device or RAM - inode number of device and file offset.
Sequence wise -> During Process loading ( may be a loader program)-> Kernel will populate the data in the kernel virtual memory area struct. It will also set the CR3 register. Now Process starts executing, it will initially get some frequent page faults.Now the VM area struct will be updated (if required) and then the page table. Now, MMU will succeed in translating the address. So, when I say process accessing a memory, it is the MMU which is accessing the memory on behalf of the process. This is all about user-space. Kernel space is entirely different. The kernel space doesn't need the MMU, it can directly map to the physical address - low mem. For high mem ( to access user space from kernel space), it will do the temporary page table updation - internally. This is a separate page table for kernel, a temporary one. The kernel space doesn't need MMU. Am I correct?
Don't know when MMU is disabled, who will do all this circus?
Nobody. All this circus is intended to do two things: translate the virtual address you gave it into a real address, and if it can't do that then to abort the instruction entirely and start executing a routine addressed from an architecturally pre-defined address, see "page fault" there for the basic one.
When the MMU is shut off, no translation is done and the address you gave it is fed directly down the CPU's address-processing pipe just as any address the MMU might have translated it to would have been.
So, when I say process accessing a memory, it is the MMU which is accessing the memory on behalf of the process.
You're on the right track here, the MMU is mediating the access, but it isn't doing the access. It's doing only what you described before, translating it. What's generally called the Load/Store unit, gets it next, and it's the one that handles talking to whatever holds the closest good copy of the data at that address, "does the access".
The kernel space doesn't need the MMU, it can directly map to the physical address
That depends on how you define "need". It can certainly shut it off, but it almost never does. First, it has to talk to user space, and the MMU has to be running to translate what user space has to addresses the Load-Store unit can use. Second, the flexibility and protection provided by the MMU are very valuable, they're not discarded without a really compelling reason. I know at least one OS will (or would, it's been a while) run some bulk copies MMU-off, but that's about it.

How can we specify physical address for variable?

Any suggestions/discussions are welcome!
The question is actually brief as title, but I'll explain why I need physical address.
Background:
These days I'm fascinated by cache and multi-core architecture, and now I'm quite curious how cache influence our programs, under the parallel environment.
In some CPU models (for example, my Intel Core Duo T5800), the L2 cache is shared among cores. So, if program A is accessing memory at physical address like
0x00000000, 0x20000000, 0x40000000...
and program B accessing data at
0x10000000, 0x30000000, 0x50000000...
Since these addresses share the same suffix, the related set in L2 cache will be flushed frequently. And we're expected to see two programs fighting with each other, reading data slowly from memory instead of cache, although, they are separated in different cores.
Then I want to verify the result in practice. In this experiment, I have to know the physical address instead of virtual address. But how can I cope with this?
The first attempt:
Eat a large space from heap, mask, and get the certain address.
My CPU has a L2 cache with size=2048KB and associativity=8, so physical addressess like 0x12340000, 0x12380000, 0x123c0000 will be related to the first set in L2 cache.
int HEAP[200000000]={0};
int *v[2];
int main(int argc, char **argv) {
v[0] = (int*)(((unsigned)(HEAP)+0x3fffc) & 0xfffc0000);
v[1] = (int*) ((unsigned)(v[0]) + 0x40000);
// one program pollute v[0], another polluting v[1]
}
Sadly, with the "help" of virtual memory, variable HEAP is not always continuous inside physical memory. v[0] and v[1] might be related to different cache sets.
The second attempt
access /proc/self/mem, and try to get memory information.
Hmm... seems that the results are still about virtual memory.
Your understanding of memory and these addresses is incomplete/incorrect. Essentially, what you're trying to test is futile.
In the context of user-mode processes, pretty much every single address you see is a virtual address. That is, an address that makes sense only in the context of that process. The OS manages the mapping of where this virtual memory space (unique to a process) maps to memory pages. These memory pages at any given time may map to pages that are paged-in (i.e. reside in physical RAM) - or they may be paged-out, and exist only in the swap file on disk.
So to address the Background example, those addresses are from two different processes - it means absolutely nothing to try and compare them. Whether or not their code is present in any of the caches depends on a number of things, including the cache-replacement strategy of the processor, the caching policies enabled by the OS, the number of other processes (including kernel-mode threads), etc.
In your first attempt, again you aren't going to get anywhere as far as actually testing CPU cache directly. First of all, your large buffer is not going to be on the heap. It is going to be part of the data section (specifically the .bss) of the executable. The heap is used for the malloc() family of memory allocations. Secondly, it doesn't really matter if you allocate some huge 1GB region, because although it is contiguous in the virtual address space of your process, it is up to the OS to allocate pages of virtual memory wherever it deems fit - which may not actually be contiguous. Again, you have pretty much no control over memory allocation from userspace. "Is there a way to allocate contiguous physical memory from userspace in linux?" The short answer is No.
/proc/$pid/maps isn't going to get you anywhere either. Yes there are plenty of addresses listed in there, but again, they are all in the virtual address space of process $pid. Some more information on these: How do I read from /proc/$pid/mem under Linux?

C accessing memory location

I mean the physical memory, the RAM.
In C you can access any memory address, so how does the operating system then prevent your program from changing memory address which is not in your program's memory space?
Does it set specific memory adresses as begin and end for each program, if so how does it know how much is needed.
Your operating system kernel works closely with memory management (MMU) hardware, when the hardware and OS both support this, to make it impossible to access memory you have been disallowed access to.
Generally speaking, this also means the addresses you access are not physical addresses but rather are virtual addresses, and hardware performs the appropriate translation in order to perform the access.
This is what is called a memory protection. It may be implemented using different methods. I'd recommend you start with a Wikipedia article on this subject — http://en.wikipedia.org/wiki/Memory_protection
Actually, your program is allocated virtual memory, and that's what you work with. The OS gives you a part of the RAM, you can't access other processes' memory (unless it's shared memory, look it up).
It depends on the architecture, on some it's not even possible to prevent a program from crashing the system, but generally the platform provides some means to protect memory and separate address space of different processes.
This has to do with a thing called 'paging', which is provided by the CPU itself. In old operating systems, you had 'real mode', where you could directly access memory addresses. In contrast, paging gives you 'virtual memory', so that you are not accessing the raw memory itself, but rather, what appears to your program to be the entire memory map.
The operating system does "memory management" often coupled with TLB's (Translation Lookaside Buffers) and Virtual Memory, which translate any address to pages, which the operation system can tag readable or executable in the current processes context.
The minimum requirement for a processors MMU or memory management unit is in current context restrict the accessable memory to a range which can be only set in processors registers in supervisor mode (as opposed to user mode).
The logical address is generated by the CPU which is mapped to the physical address by the memory mapping unit. Unlike the physical address space the logical address is not restricted by memory size and you just get to work with the logical address space. The address binding is done by MMU. So you never deal with the physical address directly.
Most computers (and all PCs since the 386) have something called the Memory Management Unit (or MMU). It's job is to translate local addresses used by a program into the physical addresses needed to fetch real bytes from real memory. It's the operating system's job to program the MMU.
As a result of this, programs can be loaded into any region of memory and appear, from that program's point of view while executing, to be be any any other address. It's common to find that the code for all programs appear (locally) to be at the same address and their data always appears (locally) to be at the same address even though physically they will be in different locations. With each memory access, the MMU transparently translates from the local address space to the physical one.
If a program trys to access a memory address that has not been mapped into its local address space, the hardware generates an exception and typically gets flagged as a "segmentation violation", followed by the forcible termination of the program. This protects from accessing the memory of other processes.
But that doesn't have to be the case! On systems with "virtual memory" and current resource demands on RAM that exceed the amount of physical memory, some pages (just blocks of memory of a common size, often on the order of 4-8kB) can be written out to disk and given as RAM to a program trying to allocate and use new memory. Later on, when that page is needed by whatever program owns it, the memory access causes an exception and the OS swaps out some other memory page and re-loads the needed one from disk. The program that "page-faulted" gets delayed while this happens but otherwise notices nothing.
There are lots of other tricks the MMU/OS can do as well, like sharing memory between processes, making a disk file appear to be direct-memory-accessible, setting some pages as "NX" so they can't be treated as executable code, using arbitrary sections of the logical memory space regardless of how much and at what address the physical ram uses, and more.

Resources