How can we specify physical address for variable? - c

Any suggestions/discussions are welcome!
The question is actually brief as title, but I'll explain why I need physical address.
Background:
These days I'm fascinated by cache and multi-core architecture, and now I'm quite curious how cache influence our programs, under the parallel environment.
In some CPU models (for example, my Intel Core Duo T5800), the L2 cache is shared among cores. So, if program A is accessing memory at physical address like
0x00000000, 0x20000000, 0x40000000...
and program B accessing data at
0x10000000, 0x30000000, 0x50000000...
Since these addresses share the same suffix, the related set in L2 cache will be flushed frequently. And we're expected to see two programs fighting with each other, reading data slowly from memory instead of cache, although, they are separated in different cores.
Then I want to verify the result in practice. In this experiment, I have to know the physical address instead of virtual address. But how can I cope with this?
The first attempt:
Eat a large space from heap, mask, and get the certain address.
My CPU has a L2 cache with size=2048KB and associativity=8, so physical addressess like 0x12340000, 0x12380000, 0x123c0000 will be related to the first set in L2 cache.
int HEAP[200000000]={0};
int *v[2];
int main(int argc, char **argv) {
v[0] = (int*)(((unsigned)(HEAP)+0x3fffc) & 0xfffc0000);
v[1] = (int*) ((unsigned)(v[0]) + 0x40000);
// one program pollute v[0], another polluting v[1]
}
Sadly, with the "help" of virtual memory, variable HEAP is not always continuous inside physical memory. v[0] and v[1] might be related to different cache sets.
The second attempt
access /proc/self/mem, and try to get memory information.
Hmm... seems that the results are still about virtual memory.

Your understanding of memory and these addresses is incomplete/incorrect. Essentially, what you're trying to test is futile.
In the context of user-mode processes, pretty much every single address you see is a virtual address. That is, an address that makes sense only in the context of that process. The OS manages the mapping of where this virtual memory space (unique to a process) maps to memory pages. These memory pages at any given time may map to pages that are paged-in (i.e. reside in physical RAM) - or they may be paged-out, and exist only in the swap file on disk.
So to address the Background example, those addresses are from two different processes - it means absolutely nothing to try and compare them. Whether or not their code is present in any of the caches depends on a number of things, including the cache-replacement strategy of the processor, the caching policies enabled by the OS, the number of other processes (including kernel-mode threads), etc.
In your first attempt, again you aren't going to get anywhere as far as actually testing CPU cache directly. First of all, your large buffer is not going to be on the heap. It is going to be part of the data section (specifically the .bss) of the executable. The heap is used for the malloc() family of memory allocations. Secondly, it doesn't really matter if you allocate some huge 1GB region, because although it is contiguous in the virtual address space of your process, it is up to the OS to allocate pages of virtual memory wherever it deems fit - which may not actually be contiguous. Again, you have pretty much no control over memory allocation from userspace. "Is there a way to allocate contiguous physical memory from userspace in linux?" The short answer is No.
/proc/$pid/maps isn't going to get you anywhere either. Yes there are plenty of addresses listed in there, but again, they are all in the virtual address space of process $pid. Some more information on these: How do I read from /proc/$pid/mem under Linux?

Related

What exactly is "memory" in C Programming?

I'm curious to know what "Memory" really stands for.
When I compile and execute this code:
#include <stdio.h>
int main(void)
{
int n = 50;
printf("%p\n", &n);
}
As we know, we get a Hex output like:
0x7ffeee63dabc
What does that Hex address physically stand for? Is it a part of my computer's L1 Cache? RAM? SSD?
Where can I read more about this, any references would be helpful. Thank you.
Some Background:
I've recently picked up learning Computer Science again after a break of a few years (I was working in the Industry as a low-code / no-code Web Developer) and realised there's a few gaps in my knowledge I want to colour in.
In learning C (via CS50x) I'm on the week of Memory. And I realise I don't actually know what Memory this is referring to. The course either assumes that the students already know this, or that it isn't pertinent to the context of this course (it's an Intro course so abstractions make sense to avoid going down rabbit holes), but I'm curious and I'd like to chase it down to find out the answers.
computer architecture 101
In your computer there is a CPU chip and there are RAM chips.
The CPU's job is to calculate things. The RAM's job is to remember things.
The CPU is in charge. When it wants to remember something, or look up something it's remembering, it asks the RAM.
The RAM has a bunch of slots where it can store things. Each slot holds 1 byte. The slot number (not the number in the slot, but the number of the slot) is called an address. Each slot has a different address. They start from 0 and go up: 0, 1, 2, 3, 4, ... Like letterboxes on a street, but starting from 0.
The way the CPU tells the RAM which thing to remember is by using a number called an address.
The CPU can say: "Put the number 126 into slot 73224." And it can say, "Which number is in slot 97221?"
We normally write slot numbers (addresses) in hexadecimal, with 0x in front to remind us that they're hexadecimal. It's tradition.
How does the CPU know which address it wants to access? Simple: the program tells it.
operating systems 101
An operating system's job is to keep the system running smoothly
That doesn't happen when faulty programs are allowed to access memory that doesn't belong to them.
So the operating system decides which memory the program is allowed to access, and which memory it isn't. It tells the CPU this information.
The "Am I allowed to access this memory?" information applies in 4 kilobyte chunks called "pages". Either you can access the entire page, or none of it. That's because if every byte had separate access information, you'd need to waste half your RAM just storing the access information!
If you try to access an address in a page that the OS said you can't access, the CPU narcs to the OS, which then stops running your program.
operating systems 102
Remember this shiny new "virtual memory" feature from the Windows 95 days?
"Virtual memory" means the addresses your program uses aren't the real RAM addresses.
Whenever you access an address, the CPU looks up the real address. This also uses pages. So the OS can make any "address page" go to any "real page".
These are not official terms - OS designers actually say that any "virtual page" can "map" to any "physical page".
If the OS wants a physical page but there aren't any left, it can pick one that's already used, save its data onto the disk, make a little note that it's on disk, and then it can reuse the page.
What if the program tries to access a page that's on disk? The OS lies to the CPU: it says "The program is not allowed to access this page." even though it is allowed.
When the CPU narcs to the OS, the OS doesn't stop the program. It pauses the program, finds something else to store on disk to make room, reads in the data for the page the program wants, then it unpauses the program and tells the CPU "actually, he's allowed to access this page now." Neat trick!
So that's virtual memory. The CPU doesn't know the difference between a page that's on disk, and one that's not allocated. Your program doesn't know the difference between a page that's on disk, and one that isn't. Your program just suffers a little hiccup when it has to get something from disk.
The only way to know whether a virtual page is actually stored in RAM (in a physical page), or whether it's on disk, is to ask the OS.
Virtual page numbers don't have to start from 0; the OS can choose any virtual page number it wants.
computer architecture 102
A cache is a little bit of memory in the CPU so it doesn't have to keep asking the RAM chip for things.
The first time the CPU wants to read from a certain address, it asks the RAM chip. Then, it chooses something to delete from its cache, deletes it, and puts the value it just read into the cache instead.
Whenever the CPU wants to read from a certain address, it checks if it's in the cache first.
Things that are in the cache are also in RAM. It's not one or the other.
The cache typically stores chunks of 64 bytes, called cache lines. Not pages!
There isn't a good way to know whether a cache line is stored in the cache or not. Even the OS doesn't know.
programming language design 101
C doesn't want you to know about all this stuff.
C is a set of rules about how you can and can't write programs. The people who design C don't want to have to explain all this stuff, so they make rules about what you can and can't do with pointers, and that's the end of it.
For example, the language doesn't know about virtual memory, because not all types of computers have virtual memory. Dishwashers or microwaves have no use for it and it would be a waste of money.
What does that Hex address physically stand for? Is it a part of my computer's L1 Cache? RAM? SSD?
The address 0x7ffeee63dabc means address 0xabc within virtual page 0x7ffeee63d. It might be on your SSD at the moment or in RAM; if you access it then it has to come into RAM. It might also be in cache at the moment but there's no good way to tell. The address doesn't change no matter where it goes to.
You should think of memory as an abstract mapping from addresses to values, nothing more.
Whether your actual hardware implements it as a single chunk of memory, or a complicated hierarchy of caches is not relevant, until you try to optimize for a very specific hardware, which you will not want to do 99% of the time.
In general memory is anything that is stored either temporarily or non-volatile. Temporary memory is lost when the machine is turned off and usually referred as RAM or simply "memory". Non volatile is kept in a hard disk, flash drive, EEPROM, etc. and usually referred as ROM or storage.
Caches are also a type of temporary memory, but they are referred just as cache and is not considered part of the RAM. The RAM on your PC is also referred as "physical memory" or "main memory".
When programming, all the variables are usually in main memory (more on this later) and brought to the caches (L1, L2, etc) when they are being used. But the caches are for the most part transparent for the app developers.
Now there is another thing to mention before I answer your question. The addresses of a program are not necessarily the addresses of the physical memory. The addresses are translated from "virtual addresses" to "physical addresses" by an MMU (memory protection unit) or similar CPU feature. The OS handles the MMU. The MMU is used for many reasons, two reasons are to hide and secure the OS memory and other apps memory from wrong memory accesses by a program. This way a program cannot access nor alter the OS or other program's memory.
Further, when there is not enough RAM to store all the memory that apps are requesting, the OS can store some of that memory in non volatile storage. Using virtual addresses, a program cannot easily know if the memory is actually in RAM or storage. This way programs can allocate a lot more memory than there is RAM. This is also why programs become very slow when they are consuming a lot of memory: it takes a long time to bring the data from storage back into main memory.
So, the address that you are printing is most likely the virtual address.
You can read something about those topics here:
https://en.wikipedia.org/wiki/Memory_management_(operating_systems)
https://en.wikipedia.org/wiki/Virtual_memory
Memory from the C standard point of view is the objects storage. How does it work and how it is organized is left to the implementation.
Even printing the pointers from the C point of view is pointless (it can be informative and interesting from the implementation point of view) and meaningless.
If your code is running under a modern operating system1, pointer values almost certainly correspond to virtual memory addresses, not physical addresses. There's a virtual memory system that maps the virtual address your code sees to a physical address in main memory (RAM), but as pages get swapped in and out that physical address may change.
For desktops, anything newer than the mid-'90s. For mainframes and minis, almost anything newer than the mid-'60s.
Is it a part of my computer's L1 Cache? RAM? SSD?
The short answer is RAM. This address is usually associated with a unique location inside your RAM. The long answer is, well - it depends!
Most machines today have a Memory Management Unit (MMU) which sits in-between the CPU and the peripherals attached to it, translating 'virtual' addresses seen by a program to real ones that actually refer to something physically attached to the bus. Setting up the MMU and allotting memory regions to your program is generally the job of the Operating System. This allows for cool stuff like sharing code/data with other running programs and more.
So the address that you see here may not be the actual physical address of a RAM location at all. However, with the help of the MMU, the OS can accurately and quickly map this number to an actual physical memory location somewhere in the RAM and allow you to store data in RAM.
Now, any accesses to the RAM may be cached in one or more of the available caches. Alternatively, it could happen that your program memory temporarily gets moved to disk (swapfile) to make space for another program. But all of this is completely automatic and transparent to the programmer. As far as your program is concerned, you are directly reading from or writing to the available RAM and the address is your handle to the unique location in RAM that you are accessing.

Means to allocate contiguous physical memory

I am aware that with C malloc and posix_memaligh one can allocate contiguous memory from the virtual address space of a process. However, I was wondering whether somehow one can allocate a buffer of physically contiguous memory? I am investigating side channel attacks that exploit L2 cache so I want to be sure that I can access the right cache lines..
Your best and easiest take at continuous memory is to request a single "huge" page from the system. The availability of those depends on your CPU and kernel options (on x86_64 the 2MB huge pages are usually available and some CPUs can also do 1GB pages; other architectures can be more flexible than this). Check out Hugepagesize field in /proc/meminfo for the size of huge pages on your setup.
Those can be accessed in two ways:
By means of a MAP_HUGETLB flag passed to mmap(). This way you can be sure that the "huge" virtual page corresponds to a continuous physical memory range. Unfortunately, whether the kernel can supply you with a "huge" page depends on many factors (current layout of memory utilization, kernel options, etc - also see the hugepages kernel boot parameter).
By means of mapping a file from a dedicated HugeTLB filesystem (see here: http://lwn.net/Articles/375096/). With HugeTLB file system you can configure the number of huge pages available in advance for some assurance that the necessary amount of huge pages will be available.
The other approach is to write a kernel module which will allocate continuous physical memory on the kernel side and then map it into your process' address space on request. This approach is sometimes employed on special purpose hardware in embedded systems. Of course, there's still no guarantee that the kernel side memory allocator will be able to come with an appropriately sized continuous physical address range, so on some occasions such address ranges are pre-reserved on boot (one dumb approach is to pass max_addr parameter to kernel on boot to leave some of the RAM out of kernel's reach).
On (almost [Note 1]) all virtual memory architectures, virtual memory is mapped to physical memory in units of a "page". The size of a page is (almost) always a power of 2, and pages are aligned by that size, because the mapping is done by only using the high-order bits of the address. It's common to see a page size of 4K (12 bits of address), although modern CPUs have an option to map much larger pages in order to reduce the size of mapping tables.
Since L2_CACHE_SIZE will generally also be a power of 2 and will be smaller than the page size, any single aligned allocation of size L2_CACHE_SIZE will necessarily be in a single page, so the bytes in the alignment will be physically contiguous as well.
So in this particular case, you can be assured that your allocated memory will be a single cache-line (at least, on standard machine architectures).
Note 1: Undoubtedly there are machines -- possibly imaginary -- which do not function this way. But the one you are playing with is not one of them.

Is the Kernel Virtual Memory struct first formed when the process is about to execute?

I have been bothering with similar questions indirectly on my other posts. Now, my understanding is better. Thus, my questions are better. So, I want to summarize the facts here. This example is based on X86-32-bit system.
Please say yes/no to my points. If no, then please explain.
MMU will look into the CR3 register to find the Process - Page Directory base address.
The CR3 register is set by the kernel.
Now MMU after reading the Page directory base address, will offset to the Page Table index (calculated from VA), from here it will read the Page frame number, now it will find the offset on the page frame number based on the VA given. It gets the physical memory address. All this is done in MMU right? Don't know when MMU is disabled, who will do all this circus? If software then it will be slow right?
I know then page fault occurs when the MMU cannot resolve the address. The kernel is informed. The kernel will update the page table based on the reading from kernel virtual memory area struct. Am I correct?
Keeping in mind, the point 4. Does it mean that before executing any process. Perhaps during loading process. Does Kernel first fills the kernel virtual memory area struct. For example, where the section of memory will be BSS, Code, DS,etc. It could be that some sections are in RAM, and some are in Storage device. When the sections of the program is moved from storage to main memory, I am assuming that kernel would be updating the Kernel virtual memory area struct. Am I correct here? So, it is the kernel who keeps a close track on the program location - whether in storage device or RAM - inode number of device and file offset.
Sequence wise -> During Process loading ( may be a loader program)-> Kernel will populate the data in the kernel virtual memory area struct. It will also set the CR3 register. Now Process starts executing, it will initially get some frequent page faults.Now the VM area struct will be updated (if required) and then the page table. Now, MMU will succeed in translating the address. So, when I say process accessing a memory, it is the MMU which is accessing the memory on behalf of the process. This is all about user-space. Kernel space is entirely different. The kernel space doesn't need the MMU, it can directly map to the physical address - low mem. For high mem ( to access user space from kernel space), it will do the temporary page table updation - internally. This is a separate page table for kernel, a temporary one. The kernel space doesn't need MMU. Am I correct?
Don't know when MMU is disabled, who will do all this circus?
Nobody. All this circus is intended to do two things: translate the virtual address you gave it into a real address, and if it can't do that then to abort the instruction entirely and start executing a routine addressed from an architecturally pre-defined address, see "page fault" there for the basic one.
When the MMU is shut off, no translation is done and the address you gave it is fed directly down the CPU's address-processing pipe just as any address the MMU might have translated it to would have been.
So, when I say process accessing a memory, it is the MMU which is accessing the memory on behalf of the process.
You're on the right track here, the MMU is mediating the access, but it isn't doing the access. It's doing only what you described before, translating it. What's generally called the Load/Store unit, gets it next, and it's the one that handles talking to whatever holds the closest good copy of the data at that address, "does the access".
The kernel space doesn't need the MMU, it can directly map to the physical address
That depends on how you define "need". It can certainly shut it off, but it almost never does. First, it has to talk to user space, and the MMU has to be running to translate what user space has to addresses the Load-Store unit can use. Second, the flexibility and protection provided by the MMU are very valuable, they're not discarded without a really compelling reason. I know at least one OS will (or would, it's been a while) run some bulk copies MMU-off, but that's about it.

C accessing memory location

I mean the physical memory, the RAM.
In C you can access any memory address, so how does the operating system then prevent your program from changing memory address which is not in your program's memory space?
Does it set specific memory adresses as begin and end for each program, if so how does it know how much is needed.
Your operating system kernel works closely with memory management (MMU) hardware, when the hardware and OS both support this, to make it impossible to access memory you have been disallowed access to.
Generally speaking, this also means the addresses you access are not physical addresses but rather are virtual addresses, and hardware performs the appropriate translation in order to perform the access.
This is what is called a memory protection. It may be implemented using different methods. I'd recommend you start with a Wikipedia article on this subject — http://en.wikipedia.org/wiki/Memory_protection
Actually, your program is allocated virtual memory, and that's what you work with. The OS gives you a part of the RAM, you can't access other processes' memory (unless it's shared memory, look it up).
It depends on the architecture, on some it's not even possible to prevent a program from crashing the system, but generally the platform provides some means to protect memory and separate address space of different processes.
This has to do with a thing called 'paging', which is provided by the CPU itself. In old operating systems, you had 'real mode', where you could directly access memory addresses. In contrast, paging gives you 'virtual memory', so that you are not accessing the raw memory itself, but rather, what appears to your program to be the entire memory map.
The operating system does "memory management" often coupled with TLB's (Translation Lookaside Buffers) and Virtual Memory, which translate any address to pages, which the operation system can tag readable or executable in the current processes context.
The minimum requirement for a processors MMU or memory management unit is in current context restrict the accessable memory to a range which can be only set in processors registers in supervisor mode (as opposed to user mode).
The logical address is generated by the CPU which is mapped to the physical address by the memory mapping unit. Unlike the physical address space the logical address is not restricted by memory size and you just get to work with the logical address space. The address binding is done by MMU. So you never deal with the physical address directly.
Most computers (and all PCs since the 386) have something called the Memory Management Unit (or MMU). It's job is to translate local addresses used by a program into the physical addresses needed to fetch real bytes from real memory. It's the operating system's job to program the MMU.
As a result of this, programs can be loaded into any region of memory and appear, from that program's point of view while executing, to be be any any other address. It's common to find that the code for all programs appear (locally) to be at the same address and their data always appears (locally) to be at the same address even though physically they will be in different locations. With each memory access, the MMU transparently translates from the local address space to the physical one.
If a program trys to access a memory address that has not been mapped into its local address space, the hardware generates an exception and typically gets flagged as a "segmentation violation", followed by the forcible termination of the program. This protects from accessing the memory of other processes.
But that doesn't have to be the case! On systems with "virtual memory" and current resource demands on RAM that exceed the amount of physical memory, some pages (just blocks of memory of a common size, often on the order of 4-8kB) can be written out to disk and given as RAM to a program trying to allocate and use new memory. Later on, when that page is needed by whatever program owns it, the memory access causes an exception and the OS swaps out some other memory page and re-loads the needed one from disk. The program that "page-faulted" gets delayed while this happens but otherwise notices nothing.
There are lots of other tricks the MMU/OS can do as well, like sharing memory between processes, making a disk file appear to be direct-memory-accessible, setting some pages as "NX" so they can't be treated as executable code, using arbitrary sections of the logical memory space regardless of how much and at what address the physical ram uses, and more.

Can malloc return same address in two different processes?

Suppose I have two process a and b on Linux. and in both process I use malloc() to allocate a memory,
Is there any chances that malloc() returns the same starting address in two processes?
If no, then who is going to take care of this.
If yes, then both process can access the same data at this address.
Is there any chances that malloc() return same starting address in two process.
Yes, but this is not a problem.
What you're not understanding is that operating systems firstly handle your physical space for you - programs etc only see virtual addresses. There is only one virtual address space, however, the operating system (let's stick with 32-bit for now) divides that up. On Windows, the top half (0xA0000000+) belongs to the kernel and the lower half to user mode processes. This is referred to as the 2GB/2GB split. On Linux, the divide is 3GB/1GB - see this article:
Kernel memory is defined to start at PAGE_OFFSET,which in x86 is 0XC0000000, or 3 gigabytes. (This is where the 3gig/1gig split is defined.) Every virtual address above PAGE_OFFSET is the kernel, any address below PAGE_OFFSET is a user address.
Now, when a process switch (as opposed to a context switch) occurs, all of the pages belonging to the current process are unmapped from virtual memory (not necessarily paging them) and all of the pages belonging to the to-be-run process are copied in (disclaimer: this might not exactly be true; one could mark pages dirty etc and copy on access instead, theoretically).
The reason for the split is that, for performance reasons, the upper half of the virtual memory space can remained mapped to the operating system kernel.
So, although malloc might return the same value in two given processes, that doesn't matter because:
physically, they're not the same address.
the processes don't share virtual memory anywhere.
For 64-bit systems, since we're currently only using 48 of those bits there is a gulf between the bottom of user mode and kernel mode which is not addressable (yet).
Yes, malloc() can return the same pointer value in separate processes, if the processes run in separate address spaces, which is achieved via virtual memory. But they won't access the same physical memory location in that case and the data at the address need not be the same, obviously.
Process is a collection of threads plus an address-space. This address-space is referred as virtual because every byte of it is not necessarily backed by physical memory. Segments of a virtual address-space will be eventually backed by physical memory if the application in the process ends up by using effectively this memory.
So, malloc() may return an identical address for two process, but it is no problem since these malloced memories will be backed by different segments of physical memory.
Moreover malloc() implementation is moslty not reentrant, therefore calling malloc() in differents threads sharing the same address-space hopefully won't result in returning the same virtual address.

Resources