I would like (in *nix) to allocate a large, contigious address space, but without consuming resources straight away, i.e. I want to reserve an address range an allocate from it later.
Suppose I do foo=malloc(3*1024*1024*1024) to allocate 3G, but on a 1G computer with 1G of swap file. It will fail, right?
What I want to do is say "Give me a memory address range foo...foo+3G into which I will be allocating" so I can guarantee all allocations within this area are contiguous, but without actually allocating straight away.
In the example above, I want to follow the foo=reserve_memory(3G) call with a bar=malloc(123) call which should succeedd since reserve_memory hasn't consumed any resources yet, it just guarantees that bar will not be in the range foo...foo+3G.
Later I would do something like allocate_for_real(foo,0,234) to consume bytes 0..234 of foo's range. At this point, the kernel would allocate some virtual pages and map them to foo...foo+123+N
Is this possible in userspace?
(The point of this is that objects in foo... need to be contiguous and cannot reasonably be moved after they are created.)
Thank you.
Short answer: it already works that way.
Slightly longer answer: the bad news is that there is no special way of reserving a range, but not allocating it. However, the good news is that when you allocate a range, Linux does not actually allocate it, it just reserves it for use by you, later.
The default behavior of Linux is to always accept a new allocation, as long as there is address range left. When you actually start using the memory though, there better be some memory or at least swap backing it up. If not, the kernel will kill a process to free memory, usually the process which allocated the most memory.
So the problem in Linux with default settings gets shifted from, "how much can I allocate", into "how much can I allocate and then still be alive when I start using the memory?"
Here is some info on the subject.
I think, a simple way would be to do that with a large static array.
On any modern system this will not be mapped to existing memory (in the executable file on disk or in RAM of your execution machine) unless you will really access it. Once you will access it (and the system has enough resources) it will be miraculously initialized to all zeros.
And your program will seriously slow down once you reach the limit of physical memory and then randomly crash if you run out of swap.
Related
Good morning. I hope some can help me out understanding how one aspect of virtual memory works and how C behaves.
From what I understand, whenever we call malloc, C will add it to the heap, with the pointers going upwards. If the stack and the heap bump on each other, malloc will return NULL, since there is no more memory to work with.
What I do not understand is the fact that the Virtual memory of each program is seized when we call it, and the low and high adresses of the runing script itself are determined. This way, the program has a fixed amount of memory to use. Is the heap growing with the data on it, or the heap is actually just a set of pointers to the actuall data? If the program has a fixed memory at the begin (because it can´t have all the memory) for me it does not make sense to store the raw data in the heap, or else we easily would get out of available memory. What am I missing?
You are making several incorrect assumptions. The most important one being that you program has one chunk of memory assigned to it that starts at address x and goes to address x + program size. This is not so, your program is divided into chunks (different platforms give them different names). The stack will be one, the heap may be several, the code will be in several etc.
When the heap manager runs out of its current chunk it can simply get another one.
Also note that this has nothing to do with 'virtual memory'.
When we use malloc() we provide a size in byte.
When we use free() we provide nothing.
This is because the OS of course knows about it already, it must have stored the information somewhere.
By the way, also our software must remember how many memory blocks it has requested, so that we can (for instance) safely iterates starting from the pointer and going ahead.
So, my question is: isn't this redundant? Can't we simply ask the OS the size of the memory pointed by a given pointer since it knows it? And if not, why not?
When we use malloc() we provide a size in byte. When we use free() we
provide nothing. This is because the OS of course knows about it
already, it must have stored the information somewhere.
Even though it gives you memory and it keeps track of what memory range belongs to your process, the OS doesn't concern itself with the internal details of your memory. malloc stores the size of the allocated chunk in its own place, also reserved inside your process (usually, it's a few bytes before the logical address returned by malloc). free simply reads that reserved information and deallocates automatically.
By the way, also our software must remember how many memory blocks it
has requested, so that we can (for instance) safely iterates starting
from the pointer and going ahead.
So, my question is: isn't this redundant? Can't we simply ask the OS
the size of the memory pointed by a given pointer since it knows it?
And if not, why not?
Given the above, it is redundant to store that information, yes. But you pretty much have to store it, because the way malloc does its book-keeping is an implementation detail.
If you know how your particular implementation works and you want to take that risk for your software, you are free (no pun intended) to do it. If you don't want to base your logic on an implementation detail (and you'd be right not to want to), you'll have to do this redundant book-keeping side-by-side with malloc's own book-keeping.
No, it's not redundant. malloc() manages, in cooperation with free() and a few other functions, a zillion tiny, individually addressed blocks within relatively large blocks which are generally obtained with sbrk(). The OS only knows about the large range(s), and has no clue which tiny block within it are in use or not. To add to the differences, sbrk() only lets you move the end of your data segment, not split it into parts to free independently. Though one could allocated memory using sbrk exclusively, you would be unable to free arbitrary chunks for reuse, or coalesce smaller chunks into larger ones, or split chunks without writing a bunch of bookkeeping code for this purpose - which ends up essentially being the same as writing malloc. Additionally, using malloc/free/... allows you to call sbrk only rarely, which is a performance bonus since sbrk is a system call with special overhead.
When we use free() we provide nothing.
Not quite true; we provide the pointer that was returned by malloc.
Can't we simply ask the OS the size of the memory pointed by a given pointer since it knows it?
Nope. Pointers are simply addresses; apart from their type, they carry no information about the size of the object they point to. How malloc/calloc/realloc and free keep track of object sizes and allocated vs. free blocks is up to the individual implementation; they may reserve some space immediately before the allocated memory to store the size, they may build an internal map of addresses and sizes, or they may do something else completely.
It would be nice if you could query a pointer for the size of the object it points to; unfortunately, that's simply not a feature of the language.
First, here is where I got the idea from:
There was once an app I wrote that used lots of little blobs of
memory, each allocated with malloc(). It worked correctly but was
slow. I replaced the many calls to malloc with just one, and then
sliced up that large block within my app. It was much much faster.
I was profiling my application, and I got a unexpectedly nice performance boost when I reduced the number of malloc calls. I am still allocating the same amount of memory, though.
So, I would like to do what this guy did, but I am unsure what's the best way to do it.
My Idea:
// static global variables
static void * memoryForStruct1 = malloc(sizeof(Struct1) * 10000);
int struct1Index = 0;
...
// somewhere, I need memory, fast:
Struct1* data = memoryForStruct1[struct1Index++];
...
// done with data:
--struct1Index;
Gotchas:
I have to make sure I don't exceed 10000
I have to release the memory in the same order I occupied. (Not a major issue in my case, since I am using recursion, but I would like to avoid it if possible).
Inspired from Mihai Maruseac:
First, I create a linked list of int that basically tells me which memory indexes are free. I then added a property to my struct called int memoryIndex which helps me return the memory occupied in any order. And Luckily, I am sure my memory needs would never exceed 5 MB at any given time, so I can safely allocate that much memory. Solved.
The system call which gives you memory is brk. The usual malloc and calloc, realloc functions simply use the space given by brk. When that space is not enough, another brk is made to create new space. Usually, the space is increased in sizes of a virtual memory page.
Thus, if you really want to have a premade pool of objects, then make sure to allocate memory in multiples of pagesize. For example, you can create one pool of 4KB. 8KB, ... space.
Next idea, look at your objects. Some of them have one size, some have other size. It will be a big pain to handle allocations for all of them from the same pool. Create pools for objects of various sizes (powers of 2 is best) and allocate from them. For example, if you'll have an object of size 34B you'd allocate space for it from the 64B pool.
Lastly, the remaining space can be either left unused or it can be moved down to the other pools. In the above example, you have 30B left. You'd split it in 16B, 8B, 4B and 2B chunks and add each chunk to their respective pool.
Thus, you'd use linked lists to manage the preallocated space. Which means that your application will use more memory than it actually needs but if this really helps you, why not?
Basically, what I've described is a mix between buddy allocator and slab allocator from the Linux kernel.
Edit: After reading your comments, it will be pretty easy to allocate a big area with malloc(BIG_SPACE) and use this as a pool for your memory.
If you can, look at using glib which has memory slicing API that supports this. It's very easy to use, and saves you from having to re-implement it.
I've been told not to use stack allocated arrays because the stack is a precious resource.
Other people have suggested to me that in fact it is perfectly fine to use stack allocated arrays so long as the array is relatively small.
I would like to have a general rule of thumb: when should I use a stack allocated array?
And when should I use a heap allocated array?
While all of your memory is limited, even today with enormous amounts of RAM and virtual memory, there is still a limit. However, it's rather large, especially compared with the stack which can be anything from a couple of kb on small embedded systems to a couple of megabytes on a PC.
Besides that, there is also the question about how you are using it, and for what. For example, if you want to return an "array" from a function, it should never be on the stack.
In general, I would say that try to keep arrays on the stack small if you can. If you are creating an array with thousands of entries on the stack you should stop and think about what you want it for.
It depends on your platform.
Nowadays, if working on the popular x64 platform, you don't really have to worry about it.
Depending on the Operating System you use, you can check how much stack space and how much heap space a userland process is allowed to use.
For example, UNIX-like systems have soft and hard limits. Some you can crank up, some you can not.
Bottom line is that you don't usually need to worry about such things. And when you need to know, you are usually tied so closely to the platform you'll be developing for that you know all these details.
Hope I answered your question. If you want specific values please specify your exact hardware, operating system and user privileges.
The answer to this question is context dependent. When you write for an operating system kernel, for example, the stack might be quite limited, and allocating more than a thousand bytes in a stack frame could cause a problem.
In modern consumer systems, the space available for the stack is typically quite large. One problem systems used to have was that address space was limited and, once the stack was assigned an address, it could not grow any further than the next object in the address space in the direction of stack growth, regardless of the availability of physical memory or of virtual memory elsewhere in the address space. This is less of a problem with today’s address spaces.
Commonly, megabytes of space can be allocated in a stack frame, and doing so is cheap and easy. However, if many routines that allocate large amounts of space are called, or one or a few routines that allocate large amounts of space are called recursively, then problems can occur because too much space is used, running into some limit (such as address space or physical memory).
Of course, running into a physical memory limit will not be alleviated by allocating space from the heap. So only the issue of consuming the address space available for the stack is relevant to the question of whether to use stack or heap.
A simple test for whether this is a problem is to insert use of a great deal of stack space in your main routine. If you use additional stack space and your application still functions under a load that uses large amounts of stack space normally, then, when you remove this artificial reservation in main, you will have plenty of margin.
A better way would be to calculate the maximum your program could use and compare that to the stack space available from the system. But that is rarely easy with today’s software.
If you are running into stack space limits, your linker or your operating system may have options to make more available.
Scope of Global and static variables will be through out the life of a process. Memory for these variable will be allocated when a process is started and it will be freed only process exits.
But local variable(stack variable) has scope only to a function on which it is defined. Memory will be allocated when a function is invoked and it will be freed once control exits from the function.
Main intention of dynamic memory is to create a variable of user defined scope. If you want to control a scope of variable means, you can allocate memory for a variable x at one function and then pass the reference(address) to as many function you want and then finally you can free it.
So with the help of dynamic allocated memory, we can create a variable which has scope higher than a local variable and lesser than global or static variable.
Apart from this if the size is very very high its better to go for dynamic memroy, if the architecture contains memory constraint.
The good reason to use heap allocated memory is passing its ownership to some other function/struct. From the other hand, stack gives you memory management for free, you can not forget to deallocate memory from stack, while there is risk of leak if you use heap.
If you create an array just for local usage, the criteria of size of the one to use, however it is hard to give exact size, above which memory should be allocated on heap. One could say that a few hundreds bytes is enough to move to heap, for some others it will be less or more than that.
When I compile and run the following code :(using gcc on cygwin)
int *a = malloc(1024*1024*100*sizeof(int));
while(1)
;
The task manager in Windows XP shows
memory usage by this process as 2232K,
which according to me should have been around 400000K.
When I compile and run the following code :(using gcc on cygwin)
int *a = malloc(1024*1024*400*sizeof(int));
while(1)
;
the memory usage goes down to 1388K;
So, rather than showing an increase, it actually
shows a decline.
What could explain this?
You have allocated the memory, making it available, but have not yet used it (reading or writing from/to it). The memory manager may not have actually allocated the physical memory to your program yet, merely said that you can have it. If you write something across the memory you just allocated (e.g. filling it with 0's -- look at memset for that), I would expect that the memory usage would be more in line with what you expect.
The second malloc would allocate 1600MiB (check your units). My guess is that this is more than your system can accommodate in a single process, so the second malloc fails. For some reason, you have a high overhead of other stuff in your application which causes memory usage to be high even though the malloc failed.
Print a to be certain.
Unfortunately memory consumption is not as simple as a single. There are numerous ways in whtich memory needs to be tracked (and it differs between operating systems a bit).
For instance on Windows, here are some of the different memory usage types
Virtual Memory
Physical Memory
Commited memory
Reserved Memory
Shared Memory
Can you give us more details on exactly which number you are talking about?
One possible explanation is that you are looking at the physical memory usage of the process. The operating system will commonly allocate virtual memory address but not commit it to physical memory until it is actually used by your process.
One way to verify this would be to set up a for loop that wrote to every element in the array and then check the memory usage of the application.
The question of allocatable memory using malloc with gcc under cygwin is discussed at http://www.cygwin.com/cygwin-ug-net/setup-maxmem.html.
It would also be good to check the return from malloc.
If you have optimization enabled and a is not acctually used, both the variable and the allocation will be removed. You can avoid this by declaring the variable as volatile.
http://en.wikipedia.org/wiki/Copy-on-write
Another use is in the calloc function. This can be implemented by having a page of physical memory filled with zeroes. When the memory is allocated, the pages returned all refer to the page of zeroes and are all marked as copy-on-write. This way, the amount of physical memory allocated for the process does not increase until data is written. This is typically only done for larger allocations.