This question already has answers here:
Why are two different concepts both called "heap"? [duplicate]
(9 answers)
Closed 8 years ago.
There're two concepts named "heap" in computer science. One is the memory pool used in memory management, the other is an algorithm.
I know they are different, but what's the relationship between them? Or they just happen to get the same name?
As far as I know, they just happen to have the same name.
One 'heap' is the data structure that contains, at its head, the greatest element of a collection. Since its children just have to be smaller than it, it is a semi-sorted collection. This is more efficient than maintaining a fully sorted list in certain circumstances, such as when you're commonly interested in only the largest element. http://en.wikipedia.org/wiki/Heap_(data_structure)
The other 'heap' is the place in memory besides the stack data can be stored. The problem with data stored in the stack is that it will be freed, and thus lost, when the function returns, and on top of that the stack can only hold so much before overflowing. The 'structure' of the heap is usually a linked list of free data segments - malloc looking for a data segment that satisfies the size requirements, marking it as in use and returning it, whereas free looks for the header for that data segment in the heap, marks it as unused and puts it back in the linked list of free data segments. (Other optimizations include things like having dedicated linked lists for certain chunk sizes.)
As you can see - not related at all!
Related
(Context: The system I am working on already maintains a form of garbage collection. I'm working on compaction.)
Most compaction algorithms follow a basic structure:
Find first object
Move object to beginning of heap
Find second object
Move second object to address right after first object
Rinse and repeat
This algorithm is followed in section 2.2 of this paper except using two pointers, denoted "from" and "to". Essentially the FROM pointer traverses the heap until it finds live objects. Then it moves said object to the TO pointer. Then TO is incremented accordingly.
The algorithm is simple, but I have yet to find much information on how these pointers determine what is a "live object". This article discusses the creation of a basic mark-and-sweep garbage collector that runs through the stack, recursively going to each reference and marking them as live. The article however requires a linked list of ALL objects ever allocated. However, this is because the author is more or less creating their own VM.
My question is, is there a way of traversing a heap in C and identifying whether the current object is a live object? Is there a similar linked list of all allocated objects already in C that I could use? Or will I require more overhead?
My question is, is there a way of traversing a heap in C and identifying whether the current object is a live object?
At a high level, the process is looking at all active pointers and determining whether or not each piece of allocated memory is accessible. (Please note that this is very complicated is C, including because a pointer could be stored in an int or other data types.) If the memory is accessible via a pointer, then it is "live" in your terms. If not, then garbage collectors would consider it safe to free that memory.
If you're asking whether or not C has a native function for determining whether or not some allocated memory can be reached, then the answer is no.
Is there a similar linked list of all allocated objects already in C that I could use? Or will I require more overhead?
Again, if you're looking for a linked list that C natively provides and you can access, then the answer is no. You'd need to implement these things.
Forgive me if you've already seen this, but there are garbage collectors that you can download if you want to see how others have done it.
TL;DR: It's impossible.
To make that work, you need to solve some non-trivial problems:
Be able to name the live objects of the heap. That means to find and follow recursively all pointers in global variables and on the stack.
Move the live objects downwards to create a compact heap
Adjust pointers in your program to reflect the new locations of the moved objects.
Regarding 1.: At runtime, the C language doesn't help you to identify where you have pointer-type global variables. And on the stack, you find a mixture of e.g. integers, function-call return addresses or data pointers. For both memory areas, you have to find a way to enumerate all potential pointer values.
To make things worse, a pointer can not only point to the beginning of your data structure, but also to some inside element. And this pointer also makes the whole object "live".
Regarding 2.: That's the easy part, using the algorithm you mentioned.
Regarding 3.: Now your objects live at new addresses, so your old pointer values are no longer correct (pointing to the old locations), and you have to adjust them. So once again, you have to follow all root references (like in 1.) and adjust all pointers that are affected by your moves. But as you can't tell for sure if e.g. 0x12345678 was meant as an numeric integer or as an (old-location) address, changing that to the new-location address might break some computation.
This question already has answers here:
Determine size of dynamically allocated memory in C
(15 answers)
Closed 5 years ago.
I have read in: How does free know how much to free? that when one has some memory allocation denoted with a pointer such as
float (*ptr)[10]=malloc(sizeof(float)*100)
for a 10x10 array, ptr has a "head" to it with "accounting" information telling of the "step size" and what not so that you can properly perform pointer arithmetic and use free and whatnot.
Is there a consistent (not architecture dependent) and reliable (defined behavior) that can allow one to get their hands on this information?
I have read elsewhere that the de facto way to track array length when there are casts and dynamic memory allocations about is to manually allocate a slot to store the size. This naturally leads me to believe the answer to my question is 'no' yet I think I'd rather not make assumptions or I'll get my own sort of memory leakage.
Converting comments into an answer.
There is no defined standard way to get at the 'size of the block of allocated memory'. Each implementation has to have a way of knowing the size of each block it allocates, but there's no way for a programmer using the implementation to know the size (in general).
So it is dependent on some number of things, but if all is known, system, architecture, compiler, you're saying there is no resource to find out how things are formatted in memory?
There is no standard (neither de jure nor de facto standard) way to get at the information about the size of a block of memory allocated. All else apart, the size allocated by the library is usually bigger than the size requested (definitely because of the housekeeping data, but even the data portion may be rounded up to a multiple of 8 or a multiple of 16) — should the code report the size requested or the size allocated?
And, as 1201ProgramAlarm noted, one option on open source systems is to look at the C library's implementation of malloc() and free() to see what it does and devise a mechanism to provide the answer to the programmer. However, any such research is specific to that system — different systems will do it differently, in general — and the whole idea runs into a stone wall if the system is a closed source system.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I’m stuck trying to figure out what, exactly, are the definitions of the following stack implementations and the advantages and disadvantages associated with each.
1.Array-based implementation
2.Linked implementation
3.Blocked implementation
Any help would be much appreciated.
A user of a stack implementation expects a dynamic data structure. C does not provide that directly in language terms. You have to allocate and administrate the memory your own.
Allocating an array, therefore a static sized structure, is very easy. However, you have the problem, that the number of maximum entries is limited. If you make the array too small, this will cause errors, if you make the array very big, you are waisting memory.
A solution for this is to dynamically reallocate the array, if the number of entries exceeds the array size. But this moves the whole memory to another place in memory, what has some disadvantages. (I. e. you have to copy the whole memroy and it is not possible to hold a pointer to a specific entry.)
Having a linked list is the contrary. You (dynamically) allocate memory for each entry. You can free the memory for a single entry on removing from stack. This sounds better, but has the caveat, that you spent a pointer size of memory for each entry. Typically this is the size for an entry. So you double the memory consumption. Beside this, allocating small pieces of memory over and over wastes memory, too.
So you can implement a compromise: A linked list of arrays: You allocate a block for a number of entries, let's say 256. Then you fill that block with entries, without reallocating or allocating memory. If the number of entries exceeds that value, you allocate a new block for additional 256 entries. The blocks are linked. So it is a linked list of arrays.
Esp. for a stack – you do not have removals in the middle of the structure – this is the best implementation in most cases.
Think about how much space each data structure takes, and whether you can do the stack operations on them efficiently; i.e., in O(1) time.
For a basic stack, you need to be able to push new elements onto the stack, and pop the most recently pushed (top) element off. You probably also want to peek at the top element, and check if the stack is empty.
A dynamically sized array (or block, if I understand the OP's comment correctly) is fine for a stack. It may be advantageous in certain situations if you will be accessing and changing the stack a lot, and want to avoid the small amount of extra work of allocating and destroying memory with each push or pop. It also gives you direct, indexed access to everything in the stack, for extended functionality. The disadvantage is that the stack will use some extra space.
You can use a singly-linked-list list for a stack as well, pushing and popping at the head. This is probably the most common type structure used if you don't need extended functionality like direct access to the elements besides the head, and if you aren't trying to implement something on the bleeding edge of time efficiency.
I am working on Embedded C, Task related implementation in OS. I have implemented the Linked List. Now it needs to minimize the use of pointers to satisfy MISRA C, in my present implementation I am searching for the best alternative for the Linked List, in Embedded OS for task operation.
It'd be easy to use a static array of structures to completely avoid pointers (you'd just use array indexes and not pointers). This has both advantages and disadvantages.
The disadvantages are:
you have to implement your own allocator (to allocate and free "array elements" within the static array)
the memory used for the array can't be used for any other purpose when it's not being used for the linked list
you have to determine a "max. number of elements that could possibly be needed"
it has all the same problems as pointers. E.g. you can access an array element that was freed, free the same array element multiple times, use an index that's out of bounds (including the equivalent of NULL if you decide to do something like use -1 to represent NULL_ELEMENT), etc.
The advantages are:
by implementing your own allocator you can avoid the mistakes caused by malloc(), including (e.g.) checking something isn't already free when freeing it and returning an error instead of trashing your own metadata
allocation can typically be simpler/faster, because you're only allocating/freeing one "thing" (array element) at a time and don't need to worry about allocating/freeing a variable number of contiguous "things" (bytes) at a time
entries in your list are more likely to be closer (in memory) to each other (unlike for malloc() where your entries are scattered among everything else you allocate), and this can improve performance (cache locality)
you have a "max. number of elements that could possibly be needed" to make it far easier to track down problems like (e.g.) memory leaks; and (where memory is limited) make it easier to determine things like worst case memory footprint
it satisfies pointless requirements (like "no pointers") despite not avoiding anything these requirements are intended to avoid
Now it needs to minimize the use of pointers to satisfy MISRA C
I used to work with some embedded engineers. They built low-end (and high-end) routers and gateways. Rather than dynamically allocating memory, they used fixed buffers provisioned at boot. They then tracked indexes into the array of provisioned buffers.
Static arrays and indexes begs for a Cursor data structure. Your first search hit is Cursor Implementation of Linked Lists from
Data Structures and Algorithm Analysis in C++, 2nd ed. by Mark Weiss. (I actually used that book in college years ago).
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
C programming : How does free know how much to free?
Hi,
When using malloc(), we specify the size of the allocation, so it knows how much to allocate. However, how does free() knows how many bytes to release? The pointer contains only the starting address of the memory block, not the length of memory block.
Thanks and Regards,
Tazim.
The answer to this is implementation-dependent; the malloc library keeps track of the length somehow, but exactly how it does so is not specified by the C language standard.
A typical approach is to store some header information (including length) before the "starting address" that malloc returns to the caller.
Malloc saves the size of the allocated pointer in some kind of data structure. When you call free it looks up the entry in this data structure and free's that much memory.
When you allocate memory, the run-time library also maintains some internal structures. It has to in order to keep track of what parts of the heap have been allocated. This information also tells it the size of a block of memory given a pointer to that memory.
It depends on the implementation but there's a good chance this information is stored just before the pointer returned.
It's implementation-specific. Some techniques:
There may be more than one pointer. An arbitrarily complex structure could be allocated and you just get a pointer to the user-payload area. The library knows the fixed offset between the pointer given to you and the pointer to the origin of the structure. The other fields could be the size and the links that thread free blocks together.
There may be a separate dictionary. This can have memory-management advantages. One problem with using the allocated block for book-keeping is that the library itself ends up writing to many if not most of the allocated pages. This keeps them dirty (in an MMU sense) and can also prevent them from being shared following a fork. This is a big problem for web servers and has led to specialized implementations of web language systems ("Ruby Enterprise") that differ mainly in memory management core.