How to detect where a block of memory was allocated? - c

A block of memory can be allocated statically, in the stack or in the heap. I want to know a way to detect if a pointer points to the heap. I work with Windows and Linux and it is not a problem a different solution for each OS. I use GCC and Mingw.
If I could know where the heap begins and where it ends, I think the problem can be solved. I think that I can detect the bottom and the top of the stack in order to know if the block is in the stack, but if there are multiple threads, then there are multiple stacks. Even I could to know where is the static memory, I think I will have problems with static memory blocks of shared libraries.
I think I will have a problem if the pointer does not point to the beginning of the block:
type* x = &(pointer[3]);

You can't.
You may try to allocate a memory on the heap at the beginning of you program, and compare the address to the pointer you want to free, but it will not be accurate in many of the cases. And what you might find and use on one platform after some research of its memory management, might not be relevant on the next.
An alternate way is to add a memory management module to your program, which will wrap the malloc, free etc. functions and will keep track of all allocated memory and will call free only if the pointer appears in his list. While this might seem like a lot of work to avoid memory leaks, I've found it very convenient many times.
EDIT
As mentioned in comments, the best way to decide is simple - free it in a place where you know if it's was located on the heap or not. I cannot tell you how easy it is in your case, but usually it shouldn't be too hard, many programs / programers did it before, and I doubt someone actually tried to check where the memory was allocated at.

Related

Is there a standard C malloc like function that takes double pointer to avoid fragmentation?

The problem with the current malloc function is that is does not keep track of the variable that stores the returned memory location. As such fragmentation can occur because it is harder to move around memory.
The MMU can solve this only to a certain extent. Lets say that instead malloc took a double pointer and kept track of the variable. Calls to free would allow free to move around memory and change the memory location.
It is highly unlikely that I am the first to think about this so I am wondering if there is a standard C function that does this or POSIX function?
I understand that this idea is not perfect. The program would have to pass around the same variable instead of copying it however it does solve the issue of fragmentation which does matter to me as I work with low memory devices.
Of course, the realloc() function will, if necessary, move a previously allocated block of memory to a new (larger) location. However, it does not necessarily impact fragmentation.
My solution (in C) has been (in low-memory conditions) to do my own memory management.
Though not the same mechanism, the closest thing to what you are speaking of are smart pointers as implemented with the boost libraries. However, they are built for C++.
Smart pointers are 'smart' in that they don't hang around after they aren't needed (and you don't have to free them) so they avoid most of the fragmentation problems you cite.
Keeping track of which variable points to which memory location cannot be done by just passing a pointer to a pointer. What if the address of the allocated memory is copied to another variable?
C is unlike Java that keeps track of object references. In your case, you may be better off managing memory on your own by preallocating a large chunk of memory and splitting it as needed, keeping track of usage, in brief, implementing your own memory management.
The idea is not that attractive when you start thinking about implementation details. You can of course write a function that returns a double pointer. What about the intermediate pointer? Where should it be stored? The program itself clearly cannot store it, because the intermediate oointer should have exactly the same lifetime as the pointed-to memory. So the allocator itself should store and manage it. And the memory where the intermediate pointer lives is not movable. But the killer misfeature is that the scheme is not thread safe. As pointers are free to change under the hood, each pointer dereference now requires a lock.

Debugging memory leak issues without any tool

Interviewer - If you have no tools to check how would you detect memory leak problems?
Answer - I will read the code and see if all the memory I have allocated has been freed by me in the code itself.
Interviewer wasn't satisfied. Is there any other way to do so?
For all the implementation defined below, one needs to write wrappers for malloc() & free() functions.
To keep things simple, keep track of count of malloc() & free(). If not equal then you have a memory leak.
A better version would be to keep track of the addresses malloc()'ed & free()'ed this way you can identify which addresses are malloc()'ed but not free()'ed. But this again, won't help much either, since you can't relate the addresses to source code, especially it becomes a challenge when you have a large source code.
So here, you can add one more feature to it. For eg, I wrote a similar tool for FreeBSD Kernel, you can modify the malloc() call to store the module/file information (give each module/file a no, you can #define it in some header), the stack trace of the function calls leading to this malloc() and store it in a data structure, along side the above information whenever a malloc() or free() is called. Use addresses returned by malloc() to match with it free(). So, when their's a memory leak, you have information about what addresses were not free()'ed in which file, what were the exact functions called (through the stack trace) to pin point it.
The way, this tool worked was, on a crash, I used to get a core-dump. I had defined globals (this data structure where I was collecting data) in kernel memory space, which I could access using gdb and retrieve the information.
Edit:
Recently while debugging a memeory leak in linux kernel, I came across this tool called kmemleak which implements a similar algorithm I described in point#3 above. Read under the Basic Algorithm section here: https://www.kernel.org/doc/Documentation/kmemleak.txt
My response when I had to do this for real was to build tools... a debugging heap layer, wrapped around the C heap, and macros to switch code to running against those calls rather than accessing the normal heap library directly. That layer included some fencepost logic to detect array bounds violations, some instrumentation to monitor what the heap was doing, optionally some recordkeeping of exactly who allocated and freed each block...
Another approach, of course, is "divide and conquer". Build unit tests to try to narrow down which operations are causing the leak, then to subdivide that code further.
Depending on what "no tools" means, core dumps are also sometimes useful; seeing the content of the heap may tell you what's being leaked, for example.
And so on....

C - Design your own free( ) function

Today, I appeared for an interview and the interviewer asked me this,
Tell me the steps how will you design your own free( ) function for
deallocate the allocated memory.
How can it be more efficient than C's default free() function ? What can you conclude ?
I was confused, couldn't think of the way to design.
What do you think guys ?
EDIT : Since we need to know about how malloc() works, can you tell me the steps to write our own malloc() function
That's actually a pretty vague question, and that's probably why you got confused. Does he mean, given an existing malloc implementation, how would you go about trying to develop a more efficient way to free the underlying memory? Or was he expecting you to start discussing different kinds of malloc implementations and their benefits and problems? Did he expect you to know how virtual memory functions on the x86 architecture?
Also, by more efficient, does he mean more space efficient or more time efficient? Does free() have to be deterministic? Does it have to return as much memory to the OS as possible because it's in a low-memory, multi-tasking environment? What's our criteria here?
It's hard to say where to start with a vague question like that, other than to start asking your own questions to get clarification. After all, in order to design your own free function, you first have to know how malloc is implemented. So chances are, the question was really about whether or not you knew anything about how malloc can be implemented.
If you're not familiar with the internals of memory management, the easiest way to get started with understanding how malloc is implemented is to first write your own.
Check out this IBM DeveloperWorks article called "Inside Memory Management" for starters.
But before you can write your own malloc/free, you first need memory to allocate/free. Unfortunately, in a protected mode OS, you can't directly address the memory on the machine. So how do you get it?
You ask the OS for it. With the virtual memory features of the x86, any piece of RAM or swap memory can be mapped to a memory address by the OS. What your program sees as memory could be physically fragmented throughout the entire system, but thanks to the kernel's virtual memory manager, it all looks the same.
The kernel usually provides system calls that allow you to map in additional memory for your process. On older UNIX OS's this was usually brk/sbrk to grow heap memory onto the edge of your process or shrink it off, but a lot of systems also provide mmap/munmap to simply map a large block of heap memory in. It's only once you have access to a large, contiguous looking block of memory that you need malloc/free to manage it.
Once your process has some heap memory available to it, it's all about splitting it into chunks, with each chunk containing its own meta information about its size and position and whether or not it's allocated, and then managing those chunks. A simple list of structs, each containing some fields for meta information and a large array of bytes, could work, in which case malloc has to run through the list until if finds a large enough unallocated chunk (or chunks it can combine), and then map in more memory if it can't find a big enough chunk. Once you find a chunk, you just return a pointer to the data. free() can then use that pointer to reverse back a few bytes to the member fields that exist in the structure, which it can then modify (i.e. marking chunk.allocated = false;). If there's enough unallocated chunks at the end of your list, you can even remove them from the list and unmap or shrink that memory off your process's heap.
That's a real simple method of implementing malloc though. As you can imagine, there's a lot of possible ways of splitting your memory into chunks and then managing those chunks. There's as many ways as there are data structures and algorithms. They're all designed for different purposes too, like limiting fragmentation due to small, allocated chunks mixed with small, unallocated chunks, or ensuring that malloc and free run fast (or sometimes even more slowly, but predictably slowly). There's dlmalloc, ptmalloc, jemalloc, Hoard's malloc, and many more out there, and many of them are quite small and succinct, so don't be afraid to read them. If I remember correctly, "The C Programming Language" by Kernighan and Ritchie even uses a simple malloc implementation as one of their examples.
You can't blindly design free() without knowing how malloc() works under the hood because your implementation of free() would need to know how to manipulate the bookkeeping data and that's impossible without knowing how malloc() is implemented.
So an unswerable question could be how you would design malloc() and free() instead which is not a trivial question but you could answer it partially for example by proposing some very simple implementation of a memory pool that would not be equivalent to malloc() of course but would indicate your presence of knowledge.
One common approach when you only have access to user space (generally known as memory pool) is to get a large chunk of memory from the OS on application start-up. Your malloc needs to check which areas of the right size of that pool are still free (through some data structure) and hand out pointers to that memory. Your free needs to mark the memory as free again in the data structure and possibly needs to check for fragmentation of the pool.
The benefits are that you can do allocation in nearly constant time, the drawback is that your application consumes more memory than actually is needed.
Tell me the steps how will you design your own free( ) function for deallocate the allocated memory.
#include <stdlib.h>
#undef free
#define free(X) my_free(X)
inline void my_free(void *ptr) { }
How can it be more efficient than C's default free() function ?
It is extremely fast, requiring zero machine cycles. It also makes use-after-free bugs go away. It's a very useful free function for use in programs which are instantiated as short-lived batch processes; it can usefully be deployed in some production situations.
What can you conclude ?
I really want this job, but in another company.
Memory usage patterns could be a factor. A default implementation of free can't assume anything about how often you allocate/deallocate and what sizes you allocate when you do.
For example, if you frequently allocate and deallocate objects that are of similar size, you could gain speed, memory efficiency, and reduced fragmentation by using a memory pool.
EDIT: as sharptooth noted, only makes sense to design free and malloc together. So the first thing would be to figure out how malloc is implemented.
malloc and free only have a meaning if your app is to work on top of an OS. If you would like to write your own memory management functions you would have to know how to request the memory from that specific OS or you could reserve the heap memory right away using existing malloc and then use your own functions to distribute/redistribute the allocated memory through out your app
There is an architecture that malloc and free are supposed to adhere to -- essentially a class architecture permitting different strategies to coexist. Then the version of free that is executed corresponds to the version of malloc used.
However, I'm not sure how often this architecture is observed.
The knowledge of working of malloc() is necessary to implement free(). You can find a implementation of malloc() and free() using the sbrk() system call in K&R The C Programming Language Chapter 8, Section 8.7 "Example--A Storage Allocator" pp.185-189.

Where do malloc() and free() store allocated sizes and addresses?

Where do malloc() and free() store the allocated addresses and their sizes (Linux GCC)? I've read that some implementations store them somewhere before the actual allocated memory, but I could not confirm that in my tests.
The background, maybe someone has another tip for this:
I'm experimenting a little bit with analyzing the heap memory of a process in order to determine the current value of a string in the other process.
Accessing the process heap memory and strolling through it is no problem. However, because the value of the string changes and the process allocates a new part of the memory each time, the string's address changes. Because the string has a fixed format it's still easy to find, but after a few changes the old versions of the string are still in the heap memory (probably freed, but still not reused / overwritten) and thus I'm not able to tell which string is the current one.
So, in order to still find the current one I want to check if a string I find in the memory is still used by comparing its address against the addresses malloc() and free() know about.
ciao,
Elmar
There are lots of ways in which malloc/free can store the size of the memory area. For example, it might be stored just before the area returned by malloc. Or it might be stored in a lookup table elsewhere. Or it might be stored implicitly: some areas might be reserved for specific sizes of allocations.
To find out how the C library in Linux (glibc) does this, get the source code from http://ftp.gnu.org/gnu/glibc/ and look at the malloc/malloc.c file. There is some documentation at the top, and it refers to A Memory Allocator by Doug Lea.
This is up to the implementation of the standard library, of course. So your best bet would probably be to dig through the source of the library (glibc is the default on Linux) and see if you can figure it out. It is probably not going to be trivial.

Checking if something was malloced

Given a pointer to some variable.. is there a way to check whether it was statically or dynamically allocated??
Quoting from your comment:
im making a method that will basically get rid of a struct. it has a data member which is a pointer to something that may or may not be malloced.. depending on which one, i would like to free it
The correct way is to add another member to the struct: a pointer to a deallocation function.
It is not just static versus dynamic allocation. There are several possible allocators, of which malloc() is just one.
On Unix-like systems, it could be:
A static variable
On the stack
On the stack but dynamically allocated (i.e. alloca())
On the heap, allocated with malloc()
On the heap, allocated with new
On the heap, in the middle of an array allocated with new[]
On the heap, within a struct allocated with malloc()
On the heap, within a base class of an object allocated with new
Allocated with mmap
Allocated with a custom allocator
Many more options, including several combinations and variations of the above
On Windows, you also have several runtimes, LocalAlloc, GlobalAlloc, HeapAlloc (with several heaps which you can create easily), and so on.
You must always release memory with the correct release function for the allocator you used. So, either the part of the program responsible for allocating the memory should also free the memory, or you must pass the correct release function (or a wrapper around it) to the code which will free the memory.
You can also avoid the whole issue by either requiring the pointer to always be allocated with a specific allocator or by providing the allocator yourself (in the form of a function to allocate the memory and possibly a function to release it). If you provide the allocator yourself, you can even use tricks (like tagged pointers) to allow one to also use static allocation (but I will not go into the details of this approach here).
Raymond Chen has a blog post about it (Windows-centric, but the concepts are the same everywhere): Allocating and freeing memory across module boundaries
The ACE library does this all over the place. You may be able to check how they do it. In general you probably shouldn't need to do this in the first place though...
Since the heap, the stack, and the static data area generally occupy different ranges of memory, it is possible with intimate knowledge of the process memory map, to look at the address and determine which allocation area it is in. This technique is both architecture and compiler specific, so it makes porting your code more difficult.
Most libc malloc implementations work by storing a header before each returned memory block which has fields (to be used by the free() call) which has information about the size of the block, as well as a 'magic' value. This magic value is to protect against the user accidently deleting a pointer which wasn't alloc'd (or freeing a block which was overwritten by the user). It's very system specific so you'd have to look at the implementation of your libc library to see exactly what magic value was there.
Once you know that, you move the given pointer back to point at header and then check it for the magic value.
Can you hook into malloc() itself, like the malloc debuggers do, using LD_PRELOAD or something? If so, you could keep a table of all the allocated pointers and use that. Otherwise, I'm not sure. Is there a way to get at malloc's bookkeeping information?
Not as a standard feature.
A debug version of your malloc library might have some function to do this.
You can compare its address to something you know to be static, and say it's malloced only if it's far away, if you know the scope it should be coming from, but if its scope is unknown, you can't really trust that.
1.) Obtain a map file for the code u have.
2.) The underlying process/hardware target platform should have a memory map file which typically indicates - starting address of memory(stack, heap, global0, size of that block, read-write attributes of that memory block.
3.) After getting the address of the object(pointer variable) from the mao file in 1.) try to see which block that address falls into. u might get some idea.
=AD

Resources