I read that arrays are contiguous in Virtual Memory but probably not in Physical memory, and I don't get that.
Let's suppose I have an array of size 4KB (one page = one frame size), In virtual memory that array is one page.
In virtual memory every page in translated into one frame so our array is still contiguous...
(In Page Table we translate pages into frames not every byte into its own frame...)
Side Question: (When Answering this please mention clearly it's for the side note):
When allocating array in virtual memory of size one page does it have to be one page or could be split into two contiguous pages in virtual memory (for example bottom half of first one and top half of the second)? In this case at worst the answer above is 2, am I wrong?
Unless the start of the array happens to be aligned to the beginning of a memory page, it can still occupy two pages; it can start near the end of one page and end on the next page. Arrays allocated on the stack will probably not be forced to occupy a single page, because stack frames are simply allocated sequentially in the stack memory, and the array will usually be at the same offset within each stack frame.
The heap memory allocator (malloc()) could try to ensure that arrays that are smaller than a page will be allocated entirely on the same page, but I'm not sure if this is actually how most allocators are implemented. Doing this might increase memory fragmentation.
I read that arrays are contiguous in Virtual Memory but probably not in Physical memory, and I don't get that.
This statement is missing something very important. The array size
For small arrays the statement is wrong. For "large/huge" arrays the statement is correct.
In other words: The probability of an array being split over multiple non-contiguous physical pages is a function of the array size.
For small arrays the probability is close to zero but the probability increases as the array size increase. When the array size increases above the systems page size, the probability gets closer and closer to 1. But an array requiring multiple page may still be contiguous in physical memory.
For you side question:
With an array size equal to your systems page size, the array can at maximum span two physical pages.
Anything (array, structure, ...) that is larger than the page size must be split across multiple pages; and therefore may be "virtually contiguous, physical non-contiguous".
Without further knowledge or restriction; anything (array, structure, ...) that is between its minimum alignment (e.g. 4 bytes for an array of uint32_t) and the page size has a probability of being split across multiple pages; where the probability depends on its size and alignment. For example, if page size is 4096 bytes and an array has a minimum alignment of 4 bytes and a size of 4092 bytes, then there's 2 chances in 1024 that it will end up on a single page (and a 99.8% chance that it will be split across multiple pages).
Anything (variable, tiny array, tiny structure, ...) that has a size equal to its minimum alignment won't (shouldn't - see note 3) be split across multiple pages.
Note 1: For anything using memory allocated from the heap, the minimum alignment can be assumed to be the (implementation defined) minimum alignment provided by the heap and not the minimum alignment of the object itself. E.g. for an array of uint16_t the minimum alignment would be 2 bytes; but malloc() will return memory with much larger alignment (maybe 16 bytes)
Note 2: When things are nested (e.g. array inside a structure inside another structure) all of the above applies to the outer structure only. E.g. if you have an array of uint16_t inside a structure where the array happens to begin at offset 4094 within the structure; then it will be significantly more likely that the array will be split across pages.
Note 3: It's possible to explicitly break minimum alignment using pointers (e.g. use malloc() to allocate 1024 bytes, then create a pointer to an array that begins at any offset you want within the allocated area).
Note 4: If something (array, structure, ...) is split across multiple pages; then there's a chance that it will still be physically contiguous. For worst case this depends on the amount of physical memory (e.g. if the computer has 1 GiB of usable physical memory and 4096 byte pages, then there's approximately 1 chance in 262000 that 2 virtually contiguous pages will be "physically contiguous by accident"). If the OS implements page/cache coloring (see https://en.wikipedia.org/wiki/Cache_coloring ) it improves the probability of "physically contiguous by accident" by the number of page/cache "colors" (e.g. if the computer has 1 GiB of usable physical memory and 4096 byte pages, and the OS uses 256 page/cache colors, then there's approximately 1 chance in 1024 that 2 virtually contiguous pages will be "physically contiguous by accident").
Note 5: Most modern operating systems using multiple page sizes (e.g. 4 KiB pages and 2 MiB pages, and maybe also 1 GiB pages). This can either make it hard to guess what the page size actually is, or improve the probability of "physically contiguous by accident" if you assume the smallest page size is used.
Note 6: For some CPUs (e.g. recent AMD/Zen) the TLBs behave as if pages are larger (e.g. as if you're using 16 KiB pages and not 4 KiB pages) if and only if page table entries are compatible (e.g. if 4 page table entries describe four physically contiguous 4 KiB pages with the same permissions/attributes). If an OS is optimized for these CPUs the result is similar to having an extra page size (4 KiB, "16 KiB", 2 MiB and maybe 1 GiB).
When allocating array in virtual memory of size one page does it have to be one page or could be split into two contiguous pages in virtual memory (for example bottom half of first one and top half of the second)?
When allocating an array in heap memory of size one page; the minimum alignment would be the implementation defined minimum alignment provided by the heap manager/malloc() (e.g. maybe 16 bytes). However; most modern heap managers switch to using an alternative (e.g. mmap() or VirtualAlloc() or similar) when the amount of memory being allocated is "large enough"; so (depending on the implementation and their definition of "large enough") it might be page aligned.
When allocating an array in raw virtual memory (e.g. using mmap() or VirtualAlloc() or similar yourself, and NOT using the heap and not using something like malloc()); page alignment is guaranteed (mostly because the virtual memory manager doesn't deal with anything smaller).
Related
My background knowledge:
To my understanding, to be allocated/used properly, memory must be contiguous in the virtual address space, but doesn't have to be actually contiguous in the physical memory or even the physical memory address space.
This would kind of suggest that the way that the memory address translations from physical to virtual work is that it is a series of mappings where any free memory blocks in the physical memory address space get assigned to a corresponding area in the virtual memory address space.
Setup to the question:
This answer, in response to a question about freeing memory in C, refers to memory fragmentation, a scenario in which (in this specific case) repeatedly allocating and freeing memory could result in there existing enough OS-allocated memory for future process usage, but it can't be used because it isn't contiguous in the free store linked-list.
If we could keep plucking memory blocks out of the OS-allocated memory that are not in use even if they are dispersed (not contiguous), wouldn't that fix the problem of memory fragmentation? To me, that seems exactly the same way that the physical to virtual memory address translations work, where non-contiguous blocks are utilized as if they were contiguous.
So, to repeat my question, why does memory have to be contiguous?
Two issues here:
It is necessary for each object to occupy a contiguous region in virtual memory, so that indexing and pointer arithmetic can be done efficiently. If you have an array int arr[5000];, then a statement like arr[i] = 0; boils down to simple arithmetic: the value of i is multiplied by 4 (or whatever sizeof(int) may be) and then added to the base address of arr. Addition is very fast for a CPU. If the elements of arr weren't located at consecutive virtual addresses, then arr[i] would require some more elaborate computation, and your program would be orders of magnitude slower. Likewise, with contiguous arrays, pointer arithmetic like ptr++ really is just addition.
Virtual memory has granularity. Every mapping of a virtual to a physical address requires some metadata to be kept somewhere in memory (say 8 bytes per mapping), and when this mapping is used, it is cached by the CPU in a translation lookaside buffer which requires some silicon on the chip. If every byte of memory could be mapped independently, your mapping tables would require 8 times more memory than the program itself, and you'd need an immense number of TLBs to keep them cached.
So virtual memory is instead done in units of pages, typically 4KB or 16KB or so. A page is a contiguous 4K region of virtual memory that is mapped to a contiguous 4K region of physical memory. Thus you only need one entry in your mapping tables (page tables) and TLB for the entire page, and addresses within a page are mapped one-to-one (the low bits of the virtual address are used directly as the low bits of the physical address).
But this means that fragmentation by sub-page amounts can't be fixed with virtual memory. As in Steve Summit's example, suppose you allocated 1000 objects of 1KB each, which were stored consecutively in virtual memory. Now you free all the odd-numbered objects. Nominally there is now 500 KB of memory available. But if you now want to allocate a new object of size 2KB, none of that 500 KB is usable, since there is no contiguous block of size 2KB in your original 1000 KB region. The available 1KB blocks can't be remapped to coalesce them into a larger block, because they can't be separated from the even-numbered objects with which they share pages. And the even-numbered objects can't be moved around in virtual memory, because there may be pointers to those objects elsewhere in the program, and there is no good way to update them all. (Implementations that do garbage collection might be able to do so, but most C/C++ implementations do not, because that comes with substantial costs of its own.)
So, to repeat my question, why does memory have to be contiguous?
It doesn't have to be contiguous.
To avoid fragmentation within a block of OS-allocated memory page; you need ensure that the memory being allocated from "heap" (e.g. using "malloc()") as at least as large as a block of block of OS-allocated memory. This gives 2 possible options:
a) change the hardware (and OS/software) so that a block of OS-allocated memory is much smaller (e.g. maybe the same size as a cache line, or maybe 64 bytes instead of 4 KiB). This would significantly increase the overhead of managing virtual memory.
b) change the minimum allocation size of the heap so that it's much larger. Typically (for modern systems) if you "malloc(1);" it rounds the size up to 8 bytes or 16 bytes for alignment and calls it "padding". In the same way, it could round the size up to the size of a block of OS-allocated memory instead and call that "padding" (e.g. "malloc(1);" might have 4095 bytes of padding and cost 4 KiB of memory). This is worse than fragmentation because padding can't be allocated (e.g. if you did "malloc(1); malloc(1);" those allocations couldn't use different parts of the same block of OS-allocated memory).
However; this only really applies to small allocations. If you use "malloc();" to allocate a large amount of memory (e.g. maybe 1234 KiB for an array) most modern memory managers will just use blocks of OS-allocated memory, and won't have a reason to care about fragmentation for those large blocks.
In other words; for smaller allocations you can solve fragmentation in the way you've suggested but it would be worse than allowing some fragmentation; and for larger allocations you can solve fragmentation in the way you've suggested and most modern memory managers already do that.
struct page* alloc_pages(gfp_t gfp_mask, unsigned int order) is the function used to allocate page in kernel. So this will allocate 2^order contiguous physical pages.
So this means the pages will allocate in the manner of 1,2,4,8,16 and so on.
What if only 3 pages are needed or 5 ,9 and so on.
From the link provided by tkausl:
The order is the power of two number of pages to allocate
So alloc_pages(gfp_mask, 3) will allocate 8 pages. alloc_pages(gfp_mask, 4) will allocate 16 pages, and so on.
alloc_pages allocates continuous page frames from the physical memory. Linux kernel has this ill-named, of course.
I believe it is using the buddy allocator.
Most of the time you don't even need continuous page frames. This is only mostly needed for hardware that does DMA transfers or similar. Very unlikely that you'd need 9 continuous frames. If you really do, you'll allocate 16 pages and free the remaining 7, for example order=0.
If you want an exact number of pages, call alloc_pages_exact(). It takes the desired size in bytes and the GFP flags, and returns a page-aligned address. Call free_pages_exact() with the same size that was passed to alloc_pages_exact() to free the memory.
#define MY_BUF_SIZE 20000 /* size in bytes */
...
mydev->buf = alloc_pages_exact(MY_BUF_SIZE, GFP_KERNEL);
...
if (mydev->buf)
free_pages_exact(mydev->buf, MY_BUF_SIZE);
Note that the implementation of alloc_pages_exact() is as follows:
Use the specified size in bytes to determine the power of two "page order" for the allocation.
Call alloc_page() (via __get_free_pages()) to allocate a single "compound page" of the determined page order, which consists of a power of two number of single pages.
Call split_page() to split the compound page into single pages.
Call free_page() on any unneeded pages at the end of the allocation.
The implementation of free_pages_exact() uses the specified size (which ought to match the size passed to alloc_pages_exact()) to determine the number of individual pages to free and frees that many contiguous pages (by calling free_page()) starting from the specified virtual address.
I'm doing some linux kernel development.
And I'm going to allocate some memory space with something like:
ptr = flex_array_alloc(size=136B, num=1<<16, GFP_KERNEL)
And ptr turns out to be NULL every time I try.
What's more, when I change the size to 20B or num to 256,there's nothing wrong and the memory can be obtained.
So I want to know if there are some limitations for requesting memory in linux kernel modules. And how to debug it or to allocate a big memory space.
Thanks.
And kzalloc has a similar behavior in my environment. That is, requesting a 136B * (1<<16) space failed, while 20B or 1<<8 succeed.
There are two limits to the size of an array allocated with flex_array_allocate. First, the object size itself must not exceed a single page, as indicated in https://www.kernel.org/doc/Documentation/flexible-arrays.txt:
The down sides are that the arrays cannot be indexed directly, individual object size cannot exceed the system page size, and putting data into a flexible array requires a copy operation.
Second, there is a maximum number of elements in the array.
Both limitations are the result of the implementation technique:
…the need for memory from vmalloc() can be eliminated by piecing together an array from smaller parts…
A flexible array holds an arbitrary (within limits) number of fixed-sized objects, accessed via an integer index.… Only single-page allocations are made…
The array is "pieced" together by using an array of pointers to individual parts, where each part is one system page. Since this array is also allocated, and only single-page allocations are made (as noted above), the maximum number of parts is slightly less than the number of pointers which can fit in a page (slightly less because there is also some bookkeeping data.) In effect, this limits the total size of a flexible array to about 2MB on systems with 8-byte pointers and 4kb pages. (The precise limitation will vary depending on the amount of wasted space in a page if the object size is not a power of two.)
I have a 4GB Ram installed on Coure2Duo PC with a 32bit Windows 7 Operating system. I have increased the paging size up to 106110MB. But after doing all this I am not able to significantly increase the maximum array size.
Following are the specs
memory
Maximum possible array: 129 MB (1.348e+08 bytes) *
Memory available for all arrays: 732 MB (7.673e+08 bytes) **
Memory used by MATLAB: 563 MB (5.899e+08 bytes)
Physical Memory (RAM): 3549 MB (3.722e+09 bytes)
* Limited by contiguous virtual address space available.
** Limited by virtual address space available.
Kindly help me on your earliest. I am not even able to read a file of 48+MB size in double format.
There are two things you can do to clear up memory for MATLAB. Since you're using a 32-bit version of the program, you're normally limited to 2GB of memory. Using the /3GB switch while opening the program makes an additional 1GB of RAM available to that program.
Second, you should consider using the pack() function, which rearranges variables in RAM to free up more contiguous memory space. This, more than anything, is affecting your ability to open individual arrays.
Remember: you can figure out how many items an array will hold by dividing the memory amount available by the size of the variable type. Double variables take up 8 bytes each. Your 129MB of space available should allow around 16.85 million double values in a single array.
You can view information about memory usage using the memory functions included in MATLAB.
memory shows the memory information
inmem will show you the variables and functions stored in memory
clear will allow you to clear the memory of specific variables or functions.
You may try to set the 3GB switch, maybe this increases the possible memory. Otherwise: Switch to a 64 bit os. Your system wastes 547MB of RAM simply because there are no addresses for it.
I learnt that when we manage a data structure such as a tree or other graph its nodes are stored in the computer in something called a block and nodes of the graph can make up the block and it is the block that is transferred between secondary and primary memory when a data structure gets moved between primary and secondary memory. So I think it's pretty clear what a block is, it can consist of different sizes depending on architecture but is often 4K. Now I want to know how a block relates to memory pages. Do pages consist of blocks or what is the relation of blocks to pages? Can we define what a page is in memory in terms of a block?
You typically try to define a block so it's either the same size as a memory page, or its size is evenly divisible by the size of a memory page, so an integral number of blocks will fit in a page.
As you mentioned, 4K tends to work well -- typical memory page sizes are 4K and 8K. Most also support at least one larger page size (e.g., 1 megabyte) but you can typically more or less ignore them; they're used primarily for mapping single, large chunks of contiguous memory (e.g., the part of graphics memory that's directly visible to the CPU).