How do pointers stay valid when objects move in memory? - c

Imagine in C I allocate two structs on the heap. One of the structs has a field which holds a pointer to the other struct.
As far as I know, data in the heap may move, thus addresses of things change. For example, defragmentation on the heap may occur, moving the second struct to a different place in the heap.
This help understanding what I'm talking about
https://en.m.wikibooks.org/wiki/Memory_Management/Memory_Compacting
The point to this struct would now be wrong (i.e. holding the wrong memory address).
I don't mean this question as specific to C, but more general: at any time, the platform may decide to move things around. How do pointers stay valid?

The key concept here is virtual memory. Your pointers do not point to a physical address, but rather to a virtual one in the virtual address space of your process. What you said is correct, data may get moved around, even swapped out to disk and then mapped again into the physical memory onto another frame, but the virtual address that your pointer points to stays always the same.

The C standard does not permit the implementation to (spontaneously) move things around in such a way that would invalidate an existing pointer. It's possible that an implementation could exist that does "defragment" the heap, but I don't know of any implementations that do.
I said "spontaneously" because realloc() calls in your code may actually cause the object to move; that's why realloc returns a pointer. If the pointer returned by realloc is different from the original pointer, the original pointer (and any pointers that aliased it) are invalid. But this is something you have to keep track of in your own code.
Managed languages (Java, C#, Python, whatever) may (or may not) deal with heap fragmentation by adding an additional level of indirection and/or keeping track of pointers into the heap. That way the language runtime can update all the pointers to object X when X moves to a different place. That would be taken care of by the garbage collection system.
It would be somewhat unusual for a C implementation to provide a garbage collector, and probably can't be done in a standards conformant way due to all the things you can (safely) do with pointers. So the premise of your question, that the heap may be spontaneously defragmented by the implementation, is not valid.

When you see a pointer in C, you observe something that looks like a memory address but, in practice, there will be one if not two levels of abstraction between this and the physical memory address.
Not only does this help make operating systems more secure but it allows it to perform any fragmentation tasks (whatever they are) without changing what is observed by your C program.

Related

Why does `realloc` not re-allocate in-place when possible?

From c99:
The realloc function deallocates the old object pointed to by ptr and returns a pointer to a new object that has the size specified by size. The contents of the new object shall be the same as that of the old object prior to deallocation, up to the lesser of the new and old sizes. Any bytes in the new object beyond the size of the old object have indeterminate values.
[..]
The realloc function returns a pointer to the new object (which may have the same value as a pointer to the old object), or a null pointer if the new object could not be allocated.
I am surprised that the standards do not specify that realloc should "try" to do in-place reallocation. Typically, if the reallocation size is lower than the currently allocated size, I would have expected the standards to ensure that the realloc would return the same pointer.
Is there a logic for the standards not to specify that realloc should be in-place if the size is reduced?
The notion of "try" doesn't mean much in the context of a standard - how hard does the implementer have to try before they have tried enough? How would one measure compliance?
Many common implementations will work exactly as you suggest: If you're resizing downward, or even resizing upward and the following memory happens to be free, they might return the original pointer after adjusting the housekeeping but not having to copy any data. Yay!
But I can think of lots of reasons why an allocator would not do this even if it were possible:
Some allocators keep different arenas for different sizes, where (making this up) it's a different pool for chunks from 1-128 bytes than there are for 64kbytes and larger. The whole scheme breaks down if the "big" pool has to keep small allocations around. This is especially the case if you're intentionally keeping "big" allocations on page boundaries.
Multi-thread aware applications often have to take special care to avoid contention so that memory allocation is not a bottleneck. If you're realloc'ing a chunk that was allocated in a different thread, it might be non-blocking to give you a new chunk (with copy) and defer on freeing the old pointer, but allowing you to keep the same pointer would block this or some other thread.
A debugging allocator will intentionally return different pointers to make sure the program doesn't incorrectly hang onto an old pointer by mistake: this breaks things sooner rather than later.
I cannot think of a case where a "please try" statement in the standard would change any decisions of any library designer. If keeping the same pointer makes sense for a given implementation, then of course they're going to use it, but if there's an overriding technical reason not to, then they won't.
I'm also not sure I can think of a case where this nudge would make any difference to the user of a library either. You still have to code it to account for all the cases, even one that "tries", so it's not like it's going to save you any code.
In the end, this is an implementation detail that a standard would never handcuff an implementer about, and the library will be judged on its own merits (performance, codesize, reliability, etc.) and this is just one aspect.
You can always code your own allocator if you really need that behavior for some reason.
EDIT: Another reason why an allocator would want to return a different pointer even if reallocing the same size: reducing memory fragmentation.
If my realloc request comes in at a time when there's a lot of free space on either side, the allocator could realize: I could extend this chunk in place (fast and easy), or I could move it some other place and coalesce what's left behind into a much larger free block.
This has been a nagging problem for a customer-written project: written ages ago in 32-bit Delphi, it runs for days at a time with a lot of memory pressure, and eventually the memory is so fragmented that it's unavailable to service modest requests even though there are many hundreds of megabytes free.
Ref: Are concatenated Delphi strings held in a hidden temporary variable that retains a reference to the string?
There's little I can do about this in Delphi, but in C it's very easy to imagine "aggressive avoiding of memory fragmentation" being a property of an allocator.
A standard is a general purpose description, it doesn't need to specify if it try to do somethings, it describes the general behavior of a function.
It is clear and logic, that for optimization purpose the system will try to resize the buffer in the same location, but since there are no guarantees that this happens, it can't specify it in the standard.

Is the bookkeeping of allocated memory blocks redundant?

When we use malloc() we provide a size in byte.
When we use free() we provide nothing.
This is because the OS of course knows about it already, it must have stored the information somewhere.
By the way, also our software must remember how many memory blocks it has requested, so that we can (for instance) safely iterates starting from the pointer and going ahead.
So, my question is: isn't this redundant? Can't we simply ask the OS the size of the memory pointed by a given pointer since it knows it? And if not, why not?
When we use malloc() we provide a size in byte. When we use free() we
provide nothing. This is because the OS of course knows about it
already, it must have stored the information somewhere.
Even though it gives you memory and it keeps track of what memory range belongs to your process, the OS doesn't concern itself with the internal details of your memory. malloc stores the size of the allocated chunk in its own place, also reserved inside your process (usually, it's a few bytes before the logical address returned by malloc). free simply reads that reserved information and deallocates automatically.
By the way, also our software must remember how many memory blocks it
has requested, so that we can (for instance) safely iterates starting
from the pointer and going ahead.
So, my question is: isn't this redundant? Can't we simply ask the OS
the size of the memory pointed by a given pointer since it knows it?
And if not, why not?
Given the above, it is redundant to store that information, yes. But you pretty much have to store it, because the way malloc does its book-keeping is an implementation detail.
If you know how your particular implementation works and you want to take that risk for your software, you are free (no pun intended) to do it. If you don't want to base your logic on an implementation detail (and you'd be right not to want to), you'll have to do this redundant book-keeping side-by-side with malloc's own book-keeping.
No, it's not redundant. malloc() manages, in cooperation with free() and a few other functions, a zillion tiny, individually addressed blocks within relatively large blocks which are generally obtained with sbrk(). The OS only knows about the large range(s), and has no clue which tiny block within it are in use or not. To add to the differences, sbrk() only lets you move the end of your data segment, not split it into parts to free independently. Though one could allocated memory using sbrk exclusively, you would be unable to free arbitrary chunks for reuse, or coalesce smaller chunks into larger ones, or split chunks without writing a bunch of bookkeeping code for this purpose - which ends up essentially being the same as writing malloc. Additionally, using malloc/free/... allows you to call sbrk only rarely, which is a performance bonus since sbrk is a system call with special overhead.
When we use free() we provide nothing.
Not quite true; we provide the pointer that was returned by malloc.
Can't we simply ask the OS the size of the memory pointed by a given pointer since it knows it?
Nope. Pointers are simply addresses; apart from their type, they carry no information about the size of the object they point to. How malloc/calloc/realloc and free keep track of object sizes and allocated vs. free blocks is up to the individual implementation; they may reserve some space immediately before the allocated memory to store the size, they may build an internal map of addresses and sizes, or they may do something else completely.
It would be nice if you could query a pointer for the size of the object it points to; unfortunately, that's simply not a feature of the language.

Is there a standard C malloc like function that takes double pointer to avoid fragmentation?

The problem with the current malloc function is that is does not keep track of the variable that stores the returned memory location. As such fragmentation can occur because it is harder to move around memory.
The MMU can solve this only to a certain extent. Lets say that instead malloc took a double pointer and kept track of the variable. Calls to free would allow free to move around memory and change the memory location.
It is highly unlikely that I am the first to think about this so I am wondering if there is a standard C function that does this or POSIX function?
I understand that this idea is not perfect. The program would have to pass around the same variable instead of copying it however it does solve the issue of fragmentation which does matter to me as I work with low memory devices.
Of course, the realloc() function will, if necessary, move a previously allocated block of memory to a new (larger) location. However, it does not necessarily impact fragmentation.
My solution (in C) has been (in low-memory conditions) to do my own memory management.
Though not the same mechanism, the closest thing to what you are speaking of are smart pointers as implemented with the boost libraries. However, they are built for C++.
Smart pointers are 'smart' in that they don't hang around after they aren't needed (and you don't have to free them) so they avoid most of the fragmentation problems you cite.
Keeping track of which variable points to which memory location cannot be done by just passing a pointer to a pointer. What if the address of the allocated memory is copied to another variable?
C is unlike Java that keeps track of object references. In your case, you may be better off managing memory on your own by preallocating a large chunk of memory and splitting it as needed, keeping track of usage, in brief, implementing your own memory management.
The idea is not that attractive when you start thinking about implementation details. You can of course write a function that returns a double pointer. What about the intermediate pointer? Where should it be stored? The program itself clearly cannot store it, because the intermediate oointer should have exactly the same lifetime as the pointed-to memory. So the allocator itself should store and manage it. And the memory where the intermediate pointer lives is not movable. But the killer misfeature is that the scheme is not thread safe. As pointers are free to change under the hood, each pointer dereference now requires a lock.

Why can I still access a member of a struct after the pointer to it is freed?

If I define a structure...
struct LinkNode
{
int node_val;
struct LinkNode *next_node;
};
and then create a pointer to it...
struct LinkNode *mynode = malloc(sizeof(struct LinkNode));
...and then finally free() it...
free(mynode);
...I can still access the 'next_node' member of the structure.
mynode->next_node
My question is this: which piece of the underlying mechanics keeps track of the fact that this block of memory is supposed to represent a struct LinkNode? I'm a newbie to C, and I expected that after I used free() on the pointer to my LinkNode, that I would no longer be able to access the members of that struct. I expected some sort of 'no longer available' warning.
I would love to know more about how the underlying process works.
The compiled program no longer has any knowledge about struct LinkedNode or field named next_node, or anything like that. Any names are completely gone from the compiled program. The compiled program operates in terms of numerical values, which can play roles of memory addresses, offsets, indices and so on.
In your example, when you read mynode->next_node in the source code of your program, it is compiled into machine code that simply reads the 4-byte numerical value from some reserved memory location (known as variable mynode in your source code), adds 4 to it (which is offset of the next_node field) and reads the 4-byte value at the resultant address (which is mynode->next_node). This code, as you can see, operates in terms of integer values - addresses, sizes and offsets. It does not care about any names, like LinkedNode or next_node. It does not care whether the memory is allocated and/or freed. It does not care whether any of these accesses are legal or not.
(The constant 4 I repeatedly use in the above example is specific for 32-bit platforms. On 64-bit platforms it would be replaced by 8 in most (or all) instances.)
If an attempt is made to read memory that has been freed, these accesses might crash your program. Or they might not. It is a matter of pure luck. As far as the language is concerned, the behavior is undefined.
There isn't and you can't. This is a classic case of undefined behavior.
When you have undefined behavior, anything can happen. It may even appear to work, only to randomly crash a year later.
It works by pure luck, because the freed memory has not yet been overwritten by something else. Once you free the memory, it is your responsibility to avoid using it again.
No part of the underlying Memory keeps track of it. It's just the semantics the programming language gives to the chunk of memory. You could e.g. cast it to something completely different and can still access the same memory region. However the catch here is, that this is more likely to lead to errors. Especially type-safty will be gone. In your case just because you called free doesn't mean that the underlying memory canges at all. There is just a flag in your operating system that marks this region as free again.
Think about it this way: the free-function is something like a "minimal" memory management system. If your call would require more than setting a flag it would introduce unneccessary overhead. Also when you access the member you (i.e. your operating system) could check if the flag for this memory region is set to "free" or "in use". But that's overhead again.
Of course that doesn't mean it wouldn't make sense to do those kind of things. It would avoid a lot of security holes and is done for example in .Net and Java. But those runtimes are much younger than C and we have much more ressources these days.
When your compiler translates your C code into executable machine code, a lot of information is thrown away, including type information. Where you write:
int x = 42;
the generated code just copies a certain bit pattern into a certain chunk of memory (a chunk that might typically be 4 bytes). You can't tell by examining the machine code that the chunk of memory is an object of type int.
Similarly, when you write:
if (mynode->next_node == NULL) { /* ... */ }
the generated code will fetch a pointer sized chunk of memory by dereferencing another pointer-sized chunk of memory, and compare the result to the system's representation of a null pointer (typically all-bits-zero). The generated code doesn't directly reflect the fact that next_node is a member of a struct, or anything about how the struct was allocated or whether it still exists.
The compiler can check a lot of things at compile time, but it doesn't necessarily generate code to perform checks at execution time. It's up to you as a programmer to avoid making errors in the first place.
In this specific case, after the call to free, mynode has an indeterminate value. It doesn't point to any valid object, but there's no requirement for the implementation to do anything with that knowledge. Calling free doesn't destroy the allocated memory, it merely makes it available for allocation by future calls to malloc.
There are a number of ways that an implementation could perform checks like this, and trigger a run-time error if you dereference a pointer after freeing it. But such checks are not required by the C language, and they're generally not implemented because (a) they would be quite expensive, making your program run more slowly, and (b) checks can't catch all errors anyway.
C is defined so that memory allocation and pointer manipulation will work correctly if your program does everything right. If you make certain errors that can be detected at compile time, the compiler can diagnose them. For example, assigning a pointer value to an integer object requires at least a compile-time warning. But other errors, such as dereferencing a freed pointer, cause your program to have undefined behavior. It's up to you, as a programmer, to avoid making those errors in the first place. If you fail, you're on your own.
Of course there are tools that can help. Valgrind is one; clever optimizing compilers are another. (Enabling optimization causes the compiler to perform more analysis of your code, and that can often enable it to diagnose more errors.) But ultimately C is not a language that holds your hand. It's a sharp tool -- and one that can be used to build safer tools, such as interpreted languages that do more run-time checking.
You need to assign NULL to mynode->next_node:
mynode->next_node = NULL;
after freeing the memory so it will indicate that you are not using anymore the memory allocated.
Without assigning the NULL value, it is still pointing to the previously freed memory location.

Finding roots for garbage collection in C

I'm trying to implement a simple mark and sweep garbage collector in C. The first step of the algorithm is finding the roots. So my question is how can I find the roots in a C program?
In the programs using malloc, I'll be using the custom allocator. This custom allocator is all that will be called from the C program, and may be a custom init().
How does garbage collector knows what all the pointers(roots) are in the program? Also, given a pointer of a custom type how does it get all pointers inside that?
For example, if there's a pointer p pointing to a class list, which has another pointer inside it.. say q. How does garbage collector knows about it, so that it can mark it?
Update: How about if I send all the pointer names and types to GC when I init it? Similarly, the structure of different types can also be sent so that GC can traverse the tree. Is this even a sane idea or am I just going crazy?
First off, garbage collectors in C, without extensive compiler and OS support, have to be conservative, because you cannot distinguish between a legitimate pointer and an integer that happens to have a value that looks like a pointer. And even conservative garbage collectors are hard to implement. Like, really hard. And often, you will need to constrain the language in order to get something acceptable: for instance, it might be impossible to correctly collect memory if pointers are hidden or obfuscated. If you allocate 100 bytes and only keep a pointer to the tenth byte of the allocation, your GC is unlikely to figure out that you still need the block since it will see no reference to the beginning. Another very important constraint to control is the memory alignment: if pointers can be on unaligned memory, your collector can be slowed down by a factor of 10x or worse.
To find roots, you need to know where your stacks start, and where your stacks end. Notice the plural form: each thread has its own stack, and you might need to account for that, depending on your objectives. To know where a stack starts, without entering into platform-specific details (that I probably wouldn't be able to provide anyways), you can use assembly code inside the main function of the current thread (just main in a non-threaded executable) to query the stack register (esp on x86, rsp on x86_64 to name those two only). Gcc and clang support a language extension that lets you assign a variable permanently to a register, which should make it easy for you:
register void* stack asm("esp"); // replace esp with the name of your stack reg
(register is a standard language keyword that is most of the time ignored by today's compilers, but coupled with asm("register_name"), it lets you do some nasty stuff.)
To ensure you don't forget important roots, you should defer the actual work of the main function to another one. (On x86 platforms, you can also query ebp/rbp, the stack frame base pointers, instead, and still do your actual work in the main function.)
int main(int argc, const char** argv, const char** envp)
{
register void* stack asm("esp");
// put stack somewhere
return do_main(argc, argv, envp);
}
Once you enter your GC to do collection, you need to query the current stack pointer for the thread you've interrupted. You will need design-specific and/or platform-specific calls for that (though if you get something to execute on the same thread, the technique above will still work).
The actual hunt for roots starts now. Good news: most ABIs will require stack frames to be aligned on a boundary greater than the size of a pointer, which means that if you trust every pointer to be on aligned memory, you can treat your whole stack as a intptr_t* and check if any pattern inside looks like any of your managed pointers.
Obviously, there are other roots. Global variables can (theoretically) be roots, and fields inside structures can be roots too. Registers can also have pointers to objects. You need to separately account for global variables that can be roots (or forbid that altogether, which isn't a bad idea in my opinion) because automatic discovery of those would be hard (at least, I wouldn't know how to do it on any platform).
These roots can lead to references on the heap, where things can go awry if you don't take care.
Since not all platforms provide malloc introspection (as far as I know), you need to implement the concept of scanned memory--that is, memory that your GC knows about. It needs to know at least the address and the size of each of such allocation. When you get a reference to one of these, you simply scan them for pointers, just like you did for the stack. (This means that you should take care that your pointers are aligned. This is normally the case if you let your compiler do its job, but you still need to be careful when you use third-party APIs).
This also means that you cannot put references to collectable memory to places where the GC can't reach it. And this is where it hurts the most and where you need to be extra-careful. Otherwise, if your platform supports malloc introspection, you can easily tell the size of each allocation you get a pointer to and make sure you don't overrun them.
This just scratches the surface of the topic. Garbage collectors are extremely complex, even when single-threaded. When you add threads to the mix, you enter a whole new world of hurt.
Apple has implemented such a conservative GC for the Objective-C language and dubbed it libauto. They have open-sourced it, along with a good part of the low-level technologies of Mac OS X, and you can find the source here.
I can only quote Hot Licks here: good luck!
Okay, before I go even further, I forgot something very important: compiler optimizations can break the GC. If your compiler is not aware of your GC, it can very well never put certain roots on the stack (only dealing with them in registers), and you're going to miss them. This is not too problematic for single-threaded programs if you can inspect registers, but again, a huge mess for multithreaded programs.
Also be very careful about the interruptibility of allocations: you must make sure that your GC cannot kick in while you're returning a new pointer because it could collect it right before it is assigned to a root, and when your program resumes it would assign that new dangling pointer to your program.
And here's an update to address the edit:
Update: How about if I send all the pointer names and types to GC when
I init it? Similarly, the structure of different types can also be
sent so that GC can traverse the tree. Is this even a sane idea or am
I just going crazy?
I guess you could allocate our memory then register it with the GC to tell it that it should be a managed resource. That would solve the interruptability problem. But then, be careful about what you send to third-party libraries, because if they keep a reference to it, your GC might not be able to detect it since they won't register their data structures with your GC.
And you likely won't be able to do that with roots on the stack.
The roots are basically all static and automatic object pointers. Static pointers would be linked inside the load modules. Automatic pointers must be found by scanning stack frames. Of course, you have no idea where in the stack frames the automatic pointers are.
Once you have the roots you need to scan objects and find all the pointers inside them. (This would include pointer arrays.) For that you need to identify the class object and somehow extract from it information about pointer locations. Of course, in C many objects are not virtual and do not have a class pointer within them.
Good luck!!
Added: One technique that could vaguely make your quest possible is "conservative" garbage collection. Since you intend to have your own allocator, you can (somehow) keep track of allocation sizes and locations, so you can pick any pointer-sized chunk out of storage and ask "Might this possibly be a pointer to one of my objects?" You can, of course, never know for sure, since random data might "look like" a pointer to one of your objects, but still you can, through this mechanism, scan a chunk of storage (like a frame in the call stack, or an individual object) and identify all the possible objects it might address.
With a conservative collector you cannot safely do object relocation/compaction (where you modify pointers to objects as you move them) since you might accidentally modify "random" data that looks like an object pointer but is in fact meaningful data to some application. But you can identify unused objects and free up the space they occupy for reuse. With proper design it's possible to have a very effective non-compacting GC.
(However, if your version of C allows unaligned pointers scanning could be very slow, since you'd have to try every variation on byte alignment.)

Resources