Virtual address space in the context of programming

Virtual address space in the context of programming - c

I am confused over what is meant by virtual address space. In a 32 bit machine a process can address 2^32 memory locations. Does that mean the virtual address space of every process is 2^32 (4GB) ?
The following is a snapshot of the virtual address space of a process. Can this grow upto 4GB? Is there any limit on the number of processes in such a system?

Can this grow upto 4GB?
The size of the address space is capped by the number of unique pointer values. For a 32-bit processor, a 32-bit value can represent 2 ^ 32 distinct values. If you allow each such value to address a different byte of memory, you get 2 ^ 32 bytes, which equals four gigabytes.
So, yes, the virtual address space of a process can theoretically grow to 4 GB. However in reality, this may also depend on the system and processor. As can be seen:
This theoretical maximum cannot be achieved on the Pentium class of processors, however. One reason is that the lower bits of the segment value encode information about the type of selector. As a result, of the 65536 possible selector values, only 8191 of them are usable to access user-mode data. This drops you to 32TB.
Note that there are two ways to allocate memory from the system, you can, of course, allocate memory for your process implicitly using C's malloc ( your question is tagged c ), but explicitly map file bytes.
Is there any limit on the number of processes in such a system?
a process includes one or more threads that actually execute the code in the process (technically, processes don’t run, threads do) and that are represented with kernel thread objects.
According to some tests carried out here, A 32-bit Windows XP system with 2GB of default address space can create approximately 2025 threads:
However a 32-bit test limit running on 64 bit Windows XP with 4GB allocated address space
created close to 3204 threads:
However the exact thread and process limit is extremely variable, it depends on a lot of factors. The way the threads specify their stack size, the way processes specify their minimum working set, the amount of physical memory available and the system commit limit. In any case, you don't usually have to worry about this on modern systems, since if your application really exceeds the thread limit you should rethink your design, as there are almost always alternate ways to accomplish the same goals with a reasonable number.

Yes, the virtual address space of every process is 4GB on 32-bit systems (232 bytes). In reality, the small amount of virtual memory that is actually used corresponds to locations in the processor cache(s), physical memory, or the disk (or wherever else the computer decides to put stuff).
Theoretically (and this behavior is pretty common among the common operating systems), a process could actually use all its virtual memory if the OS decided to put everything it couldn't fit in physical memory onto the disk, but this would make the program extremely slow because every time it tried to access some memory location that wasn't cached, it would have to go fetch it from the disk.
You asked if the picture you gave could grow up to 4GB. Actually, the picture you gave takes up all 4GB already. It is a way of partitioning a process's 4GB of virtual memory into different sections. Also if you're thinking of the heap and the stack "growing", they don't really grow; they have a set amount of memory allocated for them in that partitioning layout, and they just utilise that memory however they want to (a stack moves a pointer around, a heap maintains a data structure of used and non-used memory, etc).

Did you read wikipedia's virtual memory, process, address space pages?
What book did you read about advanced unix programming? or on advanced linux programming?
Usually, the address space is the set of segments which are valid (not in blue in your figure).
See also mmap(2) and execve(2) pages.
Try (on a Linux system)
cat /proc/self/maps
and
cat /proc/$$/maps
to understand a bit more.
See also this question and this. Read Operating Systems: Three Easy Pieces
Of course, the kernel is able to set some limits (see also setrlimit(2) syscall). And they are resource constraints (swap space, RAM, ...).

Answering the neglected part...
There's a limit on how many processes there can be. All per-process data structures that the kernel keeps in its portion of the virtual address space (which is shared, otherwise you wouldn't be able to access the kernel in every process) take some space. So, for example, if there's 1GB available for this data and only a 4KB page is needed per process in the kernel, then you arrive at about 250 thousand of processes max. In practice, this number is usually much smaller because things more are complex and there's physical memory reserved for various things for every process. See, for example, Mark Russinovich's article on process and thread limits in Windows for more details.

Related

Why do we need address virtualization in an operating system?

I am currently taking a course in Operating Systems and I came across address virtualization. I will give a brief about what I know and follow that with my question.
Basically, the CPU(modern microprocessors) generates virtual addresses and then an MMU(memory management unit) takes care of translating those virtual address to their corresponding physical addresses in the RAM. The example that was given by the professor is there is a need for virtualization because say for example: You compile a C program. You run it. And then you compile another C program. You try to run it but the resident running program in memory prevents loading a newer program even when space is available.
From my understanding, I think having no virtualization, if the compiler generates two physical addresses that are the same, the second won't run because it thinks there isn't enough space for it. When we virtualize this, as in the CPU generates only virtual addresses, the MMU will deal with this "collision" and find a spot for the other program in RAM.(Our professor gave the example of the MMU being a mapping table, that takes a virtual address and maps it to a physical address). I thought of that idea to be very similar to say resolving collisions in a hash table.
Could I please get some input on my understanding and any further clarification is appreciated.

Could I please get some input on my understanding and any further clarification is appreciated.
Your understanding is roughly correct.
Clarifications:
The data structures are nothing like a hash table.
If anything, the data structures are closer to a BTree, but even there are important differences with that as well. It is really closest to a (Java) N-dimensional array which has been sparsely allocated.
It is mapping pages rather than complete virtual / physical addresses. (A complete address is a page address + an offset within the page.).
There is no issue with collision. At any point in time, the virtual -> physical mappings for all users / processes give a one-to-one mapping from (process id + virtual page) to a either a physical RAM page or a disk page (or both).
The reasons we use virtual memory are:
process isolation; i.e. one process can't see or interfere with another processes memory
simplifying application writing; i.e. each process thinks it has a contiguous set off memory addresses, and the same set each time. (To a first approximation ...)
simplifying compilation, linking, loading; i.e. the compilers, etc there is no need to "relocate" code at compile time or run time to take into account other.
to allow the system to accommodate more processes than it has physical RAM for ... though this comes with potential risks and performance penalties.

I think you have a fundamental misconception about what goes on in an operating system in regard to memory.
(1) You are describing logical memory, not virtual memory. Virtual memory refers to the use of disk storage to simulate memory. Unmapped pages of logical memory get mapped to disk space.
Sadly, the terms logical memory and virtual memory get conflated but they are distinct concepts the the distinction is becoming increasingly important.
(2) Programs run in a PROCESS. A process only runs one program at a time (in unix each process generally only runs one program (two if you count the cloned caller) in its life.
In modern systems each process process gets a logical address space (sequential addresses) that can be mapped to physical locations or no location at all. Generally, part of that logical address space is mapped to a kernel area that is shared by all processes. The logical address space is create with the process. No address space—no process.
In a 32-bit system, addresses 0-7FFFFFFF might be user address that are (generally) mapped to unique physical locations while 80000000-FFFFFFFFmight be mapped to a system address space that is the same for all processes.
(3) Logical memory management primarily serves as a means of security; not as a means for program loading (although it does help in that regard).
(4) This example makes no sense to me:
You compile a C program. You run it. And then you compile another C program. You try to run it but the resident running program in memory prevents loading a newer program even when space is available.
You are ignoring the concept of a PROCESS. A process can only have one program running at a time. In systems that do permit serial running of programs with the same process (e.g., VMS) the executing program prevents loading another program (or the loading of another program causes the termination of the running program). It is not a memory issue.
(5) This is not correct at all:
From my understanding, I think having no virtualization, if the compiler generates two physical addresses that are the same, the second won't run because it thinks there isn't enough space for it. When we virtualize this, as in the CPU generates only virtual addresses, the MMU will deal with this "collision" and find a spot for the other program in RAM.
The MMU does not deal with collisions. The operating system sets up tables that define the logical address space when the process start. Logical memory has nothing to do with hash tables.
When a program accesses logical memory the rough sequence is:
Break down the address into a page and an offset within the page.
Does the page have an corresponding entry in the page table? If not FAULT.
Is the entry in the page table valid? If not FAULT.
Does the page table entry allow the type of access (read/write/execute) requested in the current operating mode (kernel/user/...)? If not FAULT.
Does the entry map to a physical page? If not PAGE FAULT (go load the page from disk--virtual memory––and try again).
Access the physical memory referenced by the page table.

Means to allocate contiguous physical memory

I am aware that with C malloc and posix_memaligh one can allocate contiguous memory from the virtual address space of a process. However, I was wondering whether somehow one can allocate a buffer of physically contiguous memory? I am investigating side channel attacks that exploit L2 cache so I want to be sure that I can access the right cache lines..

Your best and easiest take at continuous memory is to request a single "huge" page from the system. The availability of those depends on your CPU and kernel options (on x86_64 the 2MB huge pages are usually available and some CPUs can also do 1GB pages; other architectures can be more flexible than this). Check out Hugepagesize field in /proc/meminfo for the size of huge pages on your setup.
Those can be accessed in two ways:
By means of a MAP_HUGETLB flag passed to mmap(). This way you can be sure that the "huge" virtual page corresponds to a continuous physical memory range. Unfortunately, whether the kernel can supply you with a "huge" page depends on many factors (current layout of memory utilization, kernel options, etc - also see the hugepages kernel boot parameter).
By means of mapping a file from a dedicated HugeTLB filesystem (see here: http://lwn.net/Articles/375096/). With HugeTLB file system you can configure the number of huge pages available in advance for some assurance that the necessary amount of huge pages will be available.
The other approach is to write a kernel module which will allocate continuous physical memory on the kernel side and then map it into your process' address space on request. This approach is sometimes employed on special purpose hardware in embedded systems. Of course, there's still no guarantee that the kernel side memory allocator will be able to come with an appropriately sized continuous physical address range, so on some occasions such address ranges are pre-reserved on boot (one dumb approach is to pass max_addr parameter to kernel on boot to leave some of the RAM out of kernel's reach).

On (almost [Note 1]) all virtual memory architectures, virtual memory is mapped to physical memory in units of a "page". The size of a page is (almost) always a power of 2, and pages are aligned by that size, because the mapping is done by only using the high-order bits of the address. It's common to see a page size of 4K (12 bits of address), although modern CPUs have an option to map much larger pages in order to reduce the size of mapping tables.
Since L2_CACHE_SIZE will generally also be a power of 2 and will be smaller than the page size, any single aligned allocation of size L2_CACHE_SIZE will necessarily be in a single page, so the bytes in the alignment will be physically contiguous as well.
So in this particular case, you can be assured that your allocated memory will be a single cache-line (at least, on standard machine architectures).
Note 1: Undoubtedly there are machines -- possibly imaginary -- which do not function this way. But the one you are playing with is not one of them.

Can malloc return same address in two different processes?

Suppose I have two process a and b on Linux. and in both process I use malloc() to allocate a memory,
Is there any chances that malloc() returns the same starting address in two processes?
If no, then who is going to take care of this.
If yes, then both process can access the same data at this address.

Is there any chances that malloc() return same starting address in two process.
Yes, but this is not a problem.
What you're not understanding is that operating systems firstly handle your physical space for you - programs etc only see virtual addresses. There is only one virtual address space, however, the operating system (let's stick with 32-bit for now) divides that up. On Windows, the top half (0xA0000000+) belongs to the kernel and the lower half to user mode processes. This is referred to as the 2GB/2GB split. On Linux, the divide is 3GB/1GB - see this article:
Kernel memory is defined to start at PAGE_OFFSET,which in x86 is 0XC0000000, or 3 gigabytes. (This is where the 3gig/1gig split is defined.) Every virtual address above PAGE_OFFSET is the kernel, any address below PAGE_OFFSET is a user address.
Now, when a process switch (as opposed to a context switch) occurs, all of the pages belonging to the current process are unmapped from virtual memory (not necessarily paging them) and all of the pages belonging to the to-be-run process are copied in (disclaimer: this might not exactly be true; one could mark pages dirty etc and copy on access instead, theoretically).
The reason for the split is that, for performance reasons, the upper half of the virtual memory space can remained mapped to the operating system kernel.
So, although malloc might return the same value in two given processes, that doesn't matter because:
physically, they're not the same address.
the processes don't share virtual memory anywhere.
For 64-bit systems, since we're currently only using 48 of those bits there is a gulf between the bottom of user mode and kernel mode which is not addressable (yet).

Yes, malloc() can return the same pointer value in separate processes, if the processes run in separate address spaces, which is achieved via virtual memory. But they won't access the same physical memory location in that case and the data at the address need not be the same, obviously.

Process is a collection of threads plus an address-space. This address-space is referred as virtual because every byte of it is not necessarily backed by physical memory. Segments of a virtual address-space will be eventually backed by physical memory if the application in the process ends up by using effectively this memory.
So, malloc() may return an identical address for two process, but it is no problem since these malloced memories will be backed by different segments of physical memory.
Moreover malloc() implementation is moslty not reentrant, therefore calling malloc() in differents threads sharing the same address-space hopefully won't result in returning the same virtual address.

In a 64 bit process, will my mmap / malloc request ever be denied?

The address space for 64 bit addressing is absolutely huge. I have a program that will mmap several chunks of memory, each of the order of 100 - 500 MB. I will inevitably be remapping a few times, which may cause some fragmentation of available contiguous space.
Whatever fragmentation of space occurs, it is surely going to be trivially small concerned the available address space.
My question is: given these constraints, under normal circumstances, can I expect all mmap requests to succeed (i.e. not fail due to fragmentation)? What reasons might there be for them failing?
I am aware that the heap doesn't have the whole space to itself, but I imagine it has the vast majority.
Mac OS / Linux.

Be aware that the address space size for "64bit operating systems" is not necessarily covering the full 64bit range.
On x64 (64bit x86, aka 'AMD64'), for example, the actual available virtual address range is "only" 2x128TB, i.e. 48bit in two disjoint 47bit chunks. On some SPARC systems, it's 2x2TB, or 2x8TB (41/44 bits). This is due to the way the MMU works on these platforms.
In addition to these architectural limitations, the way the operating system lays out your address spaces also plays a role here.
64bit Windows on x64, for example, limits the virtual address size of an (even 64bit) application to 8TB (each kernel and user side1).
On UN*X systems (including Linux, MacOSX and the *BSDs), there is RLIMIT_AS that one can query for via getrlimit(), and - within system-specific limits - adjust via setrlimit. The ulimit command uses those. They return/set the upper bound, though, i.e. the allowed total virtual address space, including all mappings that a process can create (via mmap() or the malloc backend, sbrk()).
But the total address space size is different from the maximum size of a single mapping ...
Given this, it's not that hard to exhaust the virtual address space even on 64bit Linux; just try, for a test, to mmap() the same 500GB file, say, two hundred times. The mmap will fail, eventually.
In short:
mmap() will definitely fail once you're out of virtual address space. Which is, somewhat surprisingly, achievable on many "64bit" architectures simply because virtual addresses might have less than 64 significant bits. The exact cutoff depends on your CPU and your operating system, and an upper bound can be queried via getrlimit(RLIMIT_AS) or set via setrlimit(RLIMIT_AS).
Address space fragmentation can occur on 64bit by frequently using mmap() / munmap() in different orders and with different-sized blocks. This will ultimately limit the maximum size you'll be able to map as a single chunk. Predicting when exactly this will happen is hard since it both depends on your "mapping history" and the operating systems' virtual address space allocation algorithm. If ASLR (address space layout randomization) is done by the OS, it might be unpredictable in detail, and not exactly reproducible.
malloc() will also fail no later than reaching the total VA limit, on systems (like Linux) where overcommit allows you to "ask" for more memory than there is in the system (physical + swap).
On machines / operating systems where no overcommit is enabled, both malloc() and mmap() with MAP_ANON and/or MAP_PRIVATE will fail when physical + swap is exhausted, because these types of mappings require backing store by actual memory or swap.
Little technical update: Like x86 and sparc mentioned above, the new ARMv8 (64bit ARM, called "AArch64" in Linux) MMU also has a "split" address space / an address space hole - out of the 64 bits in an address, only 40 are relevant. Linux gives 39bit for user, virtual addresses 0 ... onwards, 39bit for kernel, virtual addresses ... 0xFFFFFFFF.FFFFFFFF, so the limit there is 512GB (minus what's in use already at the time the app tries the mmap).
See comments here (from the AArch64 architecture enabler kernel patch series).

mmap and malloc may fail due to lack of physical memory (RAM + swap). In general, the Unix standard speaks of address ranges that are "allowed for the address space of a process". Apart from that,
If a function be advertised to return
an error code in the event of
difficulties, thou shalt check for
that code, yea, even though the checks
triple the size of thy code and
produce aches in thy typing fingers,
for if thou thinkest ``it cannot
happen to me'', the gods shall surely
punish thee for thy arrogance.
(Spencer)

Just check the list of errors in man mmap.

Well, the OS / C library is free to decide whether there is enough memory or not. There's really no way for you to know how it decides this (it's obviously OS-specific, possibly even tunable), but it's possible that at some point the OS says "no more".
Possible reasons for failure to allocate memory:
Linux will overcommit memory, but this is tunable. Depending on how it is configured, you may get ENOMEM as soon as you exceed RAM+Swap.
If memory limits have been set using ulimit, you may get ENOMEM even sooner

How much data can be malloced at a time? what is the limit in modern OS such as Linux?

How much data can be malloced and how is the limit determined? I am writing an algorithm in C that basically utilizes repeatedly some data stored in arrays. My idea is to have this saved in dynamically allocated arrays but I am not sure if it's possible to have such amounts malloced.
I use 200 arrays of size 2046 holding complex data of size 8 byte each. I use these throughout the process so I do not wish to calculate it over and over.
What are your thoughts about feasibility of such an approach?
Thanks
Mir

How much memory malloc() can allocate depends on:
How much memory your program can address directly
How much physical memory is available
How much swap space is available
On a modern, flat-memory-model 32-bit system, your program can address 4 gigabytes, but some of the address space (usually 2 gigabytes, sometimes 1 gigabyte) is reserved for the kernel. So, as a rule of thumb, you should be able to allocate almost two gigabytes at once, assuming you have the physical memory and swap space to back it up.
On a 64-bit system, running a 64-bit OS and a 64-bit program, your addressable memory is essentially unlimited.
200 arrays of 2048 bytes each is only 400k, which should fit in cache (even on a smartphone).

A 32bit OS has a limit of 4Gb, typically some (upto half on win32) are reserved for the operating system - mapping the address space of graphcis card memory etc.
Linux supports 64Gb of address space (using Intel's 36bit PAE) on 32bit versions.
EDIT: although each process is limited to 4Gb
The main problem with allocating large amounts of memory is if you need it to be locked in RAM - then you obviously need a lot of RAM. Or if you need it all to be contiguous - it's much easier to get 4 * 1Gb chunks of memory than a single 4Gb chunk with nothing else in the way.
A common approach is to allocate all the memory you need at the start of the program so you can be sure that if the app isn't going to be possible it will fail instantly rather than when it's done 90% of the work.
Don't run other memory intensive apps at the same time.
There are also a bunch of flags you can use to suggest to the kernel that this app should get priority in memory or keep memory locked in ram - sorry it's too long since i did HPC on linux and i'm probably out of date with modern kernels.

I think that on most mordern (64bit) systems you can allocate 4GB at a time with a malloc( size_t ) call if that much memory is available. How big is each of those 'complex data' entries? if they are of the size 256 bytes, then you'll only need to allocate 100MB.
256bytes × 200 arrays × 2048 entries = 104857600bytes
104857600 bytes / 1024 / 1024 = 100MB.
So for 4096bytes each that's still only 1600MB or ≃ 1.6GB so it is feasible on most systems today, my four year old laptop got 3GB internal memory. Sometimes I does image manipulation with GIMP and it takes up over 2GB of memory.

With some implementations of malloc(), the regions are not actually backed by memory until they really get used so you can in theory carry on forever (though in practice of course the list of allocated regions assigned to your process in the kernel takes up space, so you might find you can only call malloc() a few million times even if it never actually gives you any memory). It's called "optimistic allocation" and is the strategy used by Linux (which is why it then has the OOM killer, for when it was over-optimistic).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight