Does malloc needs OS support? - c

Memory management is a service provided by underlying operating system. When we call malloc()/free() and there's no operating systems running(for example a bare metal embedded system), how is the memory allocation and tracking handled?
There should be an entity that tracks which addresses are free and which are not. That's OS memory management unit. malloc()/free() will then have to call OS system calls. So no OS means no malloc()/free(). Am I wrong in assuming this?
Update:
All answers pointed out that malloc/free can use either static pool allocation(when no OS is available) or use sbrk/brk which are kernel system calls. Question is how does the malloc/free knows if there's a kernel beneath or not?
Answer(see comment by "Kuba Ober" under his answer below):
malloc doesn't need to know anything, because the C library that you link your project with is specific to the target: if you develop for Linux, you use a C library for Linux, different than when you develop for OS X, or Windows, or bare bones ARM Cortex M0. Or, heck, barebones x86. It's people who write the C library that know how to implement it so that it works on the desired target. For example, a barebones C library for x86 would use EFI and ACPI to query the list of available blocks of RAM unused by hardware nor BIOS, and then use those in fulfilling allocation requests.

malloc() and free() do not require OS support. They can be (and often are!) implemented on bare-metal systems. For example, the Arduino library uses malloc() and free() to manage strings.
In hosted implementations (that is, an application running on an operating system), malloc() and free() will typically use operating system services to allocate new "hunks" of memory -- often as much as a few megabytes at a time -- and to return those hunks to the operating system when they are unused. Smaller allocations are handled by cutting those blocks of memory into the sizes needed by an application. This allows small allocations to be managed without the overhead of a system call.
In an unhosted implementation (like a bare-metal system), the application already has access to all memory in existence on the system, and can parcel out chunks of that memory however it likes.
At a lower level: both hosted and unhosted implementations of malloc() often work by treating each allocated or unallocated block of memory as an entry in a linked list. This is typically accomplished by storing a structure immediately before the start of each allocation, e.g.
struct malloc_block {
struct malloc_block *prev, *next;
size_t size;
...
char allocation[];
};
and returning a pointer to allocation as the return value of malloc(). Functions like realloc() and free() can retrieve the structure by subtracting the size of the structure from a pointer to the allocation.

The Malloc/free function manage a pool of memory. These functions are generally not operating system services.
How then is that pool created?
On most systems, malloc calls operating system services to map pages into the the process address space to create and expand the memory pool. If you call malloc, and no memory is available, most implementations will call a system service to map more memory and expand the pool.
The malloc implementation needs to maintain data structures that keep track of what memory in the pool is free and what has been allocated. This is done in many different ways. It is not unusual for programmers to select a malloc/free combination that works best for them and link it into their application.
So, yes there is operating system involvement—generally.
But you asked about whether they can be implemented without an operating system.
Suppose you did:
static char pool [POOLSIZE] ;
in your malloc implementation. Then you would not require a system service to create the pool during execution time. On the other hand, your pool has a fixed size.

Generally speaking: no, or at least not at runtime, if we define runtime as the moments between main() entering and returning.
Suppose you implement malloc that operates on a fixed-size pool:
static char pool[MALLOC_POOL_SIZE];
void *malloc(size_t size) {
…
}
void free(void *block) {
…
}
Then, on both hosted- and unhosted implementations you have working dynamic memory allocation. On hosted implementations, the binary loader will ensure that there’s memory mapped behind the pool. On unhosted implementations, the linker will pre-locate the pool in available RAM, and linking will fail if the pool is too large.
So no, in general no OS involvement is needed once your code is running, but OS involvement is needed to get your code to run in the first place (if there is an OS).
And of course, “not needed” means not necessary, but doesn’t exclude OS support for dynamic memory allocation in a particular C runtime library. In most hosted runtimes, malloc uses a pool that’s dynamically expanded (and perhaps contracted) by invoking relevant OS APIs. On classic Unix, the expansion would be done via the brk syscall.

Related

Can I enforce sbrk return address to be within a certain specific range?

I want to make sure the return address of sbrk is within a certain specific range. I read somewhere that sbrk allocates from an area allocated at program initialization. So I'm wondering if there's anyway I can enforce the program initialization to allocate from a specific address? For example, with mmap, I'll be able to do so with MAP_FIXED_NOREPLACE . Is it possible to have something similar?
No, this is not possible. brk and sbrk refer to the data segment of the program, and that can be loaded at any valid address that meets the needs of the dynamic linker. Different architectures can and do use different addresses, and even machines of the same architecture can use different ranges depending on the configuration of the kernel. Using a fixed address or address range is extremely nonportable and will make your program very brittle to future changes. I fully expect that doing this will cause your program to break in the future simply by upgrading libc.
In addition, modern programs are typically compiled as position-independent executables so that ASLR can be used to improve security. Therefore, even if you knew the address range that was used for one invocation of your program, the very next invocation of your program might use a totally different address range.
In addition, you almost never want to invoke brk or sbrk by hand. In almost all cases, you will want to use the system memory allocator (or a replacement like jemalloc), which will handle this case for you. For example, glibc's malloc implementation, like most others, will allocate large chunks of memory using mmap, which can significantly reduce memory usage in long-running programs, since these large chunks can be freed independently. The memory allocator also may not appreciate you changing the size of the data segment without consulting it.
Finally, in case you care about portability to other Unix systems, not all systems even have brk and sbrk. OpenBSD allocates all memory using mmap which improves security by expanding the use of ASLR (at the cost of performance).
If you absolutely must use a fixed address or address range and there is no alternative, you'll need to use mmap to allocate that range of memory.

In malloc, why use brk at all? Why not just use mmap?

Typical implementations of malloc use brk/sbrk as the primary means of claiming memory from the OS. However, they also use mmap to get chunks for large allocations. Is there a real benefit to using brk instead of mmap, or is it just tradition? Wouldn't it work just as well to do it all with mmap?
(Note: I use sbrk and brk interchangeably here because they are interfaces to the same Linux system call, brk.)
For reference, here are a couple of documents describing the glibc malloc:
GNU C Library Reference Manual: The GNU Allocator
https://www.gnu.org/software/libc/manual/html_node/The-GNU-Allocator.html
glibc wiki: Overview of Malloc
https://sourceware.org/glibc/wiki/MallocInternals
What these documents describe is that sbrk is used to claim a primary arena for small allocations, mmap is used to claim secondary arenas, and mmap is also used to claim space for large objects ("much larger than a page").
The use of both the application heap (claimed with sbrk) and mmap introduces some additional complexity that might be unnecessary:
Allocated Arena - the main arena uses the application's heap. Other arenas use mmap'd heaps. To map a chunk to a heap, you need to know which case applies. If this bit is 0, the chunk comes from the main arena and the main heap. If this bit is 1, the chunk comes from mmap'd memory and the location of the heap can be computed from the chunk's address.
[Glibc malloc is derived from ptmalloc, which was derived from dlmalloc, which was started in 1987.]
The jemalloc manpage (http://jemalloc.net/jemalloc.3.html) has this to say:
Traditionally, allocators have used sbrk(2) to obtain memory, which is suboptimal for several reasons, including race conditions, increased fragmentation, and artificial limitations on maximum usable memory. If sbrk(2) is supported by the operating system, this allocator uses both mmap(2) and sbrk(2), in that order of preference; otherwise only mmap(2) is used.
So, they even say here that sbrk is suboptimal but they use it anyway, even though they've already gone to the trouble of writing their code so that it works without it.
[Writing of jemalloc started in 2005.]
UPDATE: Thinking about this more, that bit about "in order of preference" gives me a line on inquiry. Why the order of preference? Are they just using sbrk as a fallback in case mmap is not supported (or lacks necessary features), or is it possible for the process to get into some state where it can use sbrk but not mmap? I'll look at their code and see if I can figure out what it's doing.
I'm asking because I'm implementing a garbage collection system in C, and so far I see no reason to use anything besides mmap. I'm wondering if there's something I'm missing, though.
(In my case I have an additional reason to avoid brk, which is that I might need to use malloc at some point.)
The system call brk() has the advantage of having only a single data item to track memory use, which happily is also directly related to the total size of the heap.
This has been in the exact same form since 1975's Unix V6. Mind you, V6 supported a user address space of 65,535 bytes. So there wasn't a lot of thought given for managing much more than 64K, certainly not terabytes.
Using mmap seems reasonable until I start wondering how altered or added-on garbage collection could use mmap but without rewriting the allocation algorithm too.
Will that work nicely with realloc(), fork(), etc.?
Calling mmap(2) once per memory allocation is not a viable approach for a general purpose memory allocator because the allocation granularity (the smallest individual unit which may be allocated at a time) for mmap(2) is PAGESIZE (usually 4096 bytes), and because it requires a slow and complicated syscall. The allocator fast path for small allocations with low fragmentation should require no syscalls.
So regardless what strategy you use, you still need to support multiple of what glibc calls memory arenas, and the GNU manual mentions: "The presence of multiple arenas allows multiple threads to allocate memory simultaneously in separate arenas, thus improving performance."
The jemalloc manpage (http://jemalloc.net/jemalloc.3.html) has this to say:
Traditionally, allocators have used sbrk(2) to obtain memory, which is suboptimal for several reasons, including race conditions, increased fragmentation, and artificial limitations on maximum usable memory. If sbrk(2) is supported by the operating system, this allocator uses both mmap(2) and sbrk(2), in that order of preference; otherwise only mmap(2) is used.
I don't see how any of these apply to the modern use of sbrk(2), as I understand it. Race conditions are handled by threading primitives. Fragmentation is handled just as would be done with memory arenas allocated by mmap(2). The maximum usable memory is irrelevant, because mmap(2) should be used for any large allocation to reduce fragmentation and to release memory back to the operating system immediately on free(3).
The use of both the application heap (claimed with sbrk) and mmap introduces some additional complexity that might be unnecessary:
Allocated Arena - the main arena uses the application's heap. Other arenas use mmap'd heaps. To map a chunk to a heap, you need to know which case applies. If this bit is 0, the chunk comes from the main arena and the main heap. If this bit is 1, the chunk comes from mmap'd memory and the location of the heap can be computed from the chunk's address.
So the question now is, if we're already using mmap(2), why not just allocate an arena at process start with mmap(2) instead of using sbrk(2)? Especially so if, as quoted, it is necessary to track which allocation type was used. There are several reasons:
mmap(2) may not be supported.
sbrk(2) is already initialized for a process, whereas mmap(2) would introduce additional requirements.
As glibc wiki says, "If the request is large enough, mmap() is used to request memory directly from the operating system [...] and there may be a limit to how many such mappings there can be at one time. "
A memory map allocated with mmap(2) cannot be extended as easily. Linux has mremap(2), but its use limits the allocator to kernels which support it. Premapping many pages with PROT_NONE access uses too much virtual memory. Using MMAP_FIXED unmaps any mapping which may have been there before without warning. sbrk(2) has none of these problems, and is explicitly designed to allow for extending its memory safely.
mmap() didn't exist in the early versions of Unix. brk() was the only way to increase the size of the data segment of the process at that time. The first version of Unix with mmap() was SunOS in the mid 80's, the first open-source version was BSD-Reno in 1990.
And to be usable for malloc() you don't want to require a real file to back up the memory. In 1988 SunOS implemented /dev/zero for this purpose, and in the 1990's HP-UX implemented the MAP_ANONYMOUS flag.
There are now versions of mmap() that offer a variety of methods to allocate the heap.
The obvious advantage is that you can grow the last allocation in place, which is something you can't do with mmap(2) (mremap(2) is a Linux extension, not portable).
For naive (and not-so-naive) programs which are using realloc(3) eg. to append to a string, this translates in a 1 or 2 orders of magnitude speed boost ;-)
I don't know the details on Linux specifically, but on FreeBSD for several years now mmap is preferred and jemalloc in FreeBSD's libc has sbrk() completely disabled. brk()/sbrk() are not implemented in the kernel on the newer ports to aarch64 and risc-v.
If I understand the history of jemalloc correctly, it was originally the new allocator in FreeBSD's libc before it was broken out and made portable. Now FreeBSD is a downstream consumer of jemalloc. Its very possible that its preference for mmap() over sbrk() originated with the characteristics of the FreeBSD VM system that was built around implementing the mmap interface.
It's worth noting that in SUS and POSIX brk/sbrk are deprecated and should be considered non-portable at this point. If you are working on a new allocator you probably don't want to depend on them.

Dynamic memory allocation in embedded C

Can I use functions malloc and delete in embedded C? For example, I have one function, where was created pointer on structure with function malloc. This function return address in ram and I can use this . After exit from my function, where memory was allocated, this pointer will be deleted or this memory reserved for this, while not will be function delete terminated ?
Typedef struct {
Char varA;
Char varB
} myStruct ;
Void myfunc ( void)
{
myStruct * ptrStruct = ( myStruct *) malloc ( sizeof (myStruct)) ;
// Code here
//........
return ;
}
Generally, you shouldn't be using malloc in embedded systems, because doing so doesn't make any sense as explained here. In particular, it doesn't make any sense what-so-ever to use it on bare metal systems.
The only place where it makes sense to use dynamic memory allocation is large hosted, multi-process systems where multiple processes share the same RAM. If your definition of an embedded system is an Android smart phone or a portable PC, then yes it is fine to use malloc.
If you find yourself using it anywhere else, it almost certainly means that your program design is fundamentally flawed and also that you don't know how a heap works.
In addition, almost every embedded systems programming standard bans dynamic memory allocation.
There is nothing specific about embedded systems that prevent the use of dynamic memory.
However you may need to provide support for it in a number of ways, for example:
You need to ensure that the linker allocates sufficient space for the dynamic heap. Some linker scripts may already automatically allocate all remaining memory to the heap after stack and any other reserved allocations.
You may need to implement low level stubs to allow the library to access heap memory - for example in the newlib library, you need to implement sbrk_r() for malloc() etc. to work correctly.
In a multi-threaded system you may need to implement mutex stubs to ensure safe heap allocation. If the library does not provide such stubs, then malloc()/free() etc. will not be safe to use in such an environment, and you should write wrapper functions that assert the locks externally.
There are a number of reasons however why you might choose to avoid using dynamic memory (or at least standard library implemented dynamic memory) in an embedded system however:
Standard allocation schemes have non-deterministic timing unsuited to hard-real-time systems.
You need to handle the possibility of allocation failure gracefully for every allocation. Handling a potential non-deterministic run-time error safely is more complex than simply having the compiler tell you have insufficient memory at build time.
You need to guard against memory leaks; true of any system, but with no OS to manage memory exhaustion and kill a leaking process how will your system behave?
The standard library heap management may not be thread-safe without mutex stubs or wrapper functions.
Bugs that corrupt the heap are unlikely to affect execution immediately, often only causing an observable failure when a new heap operation is performed, resulting in non-deterministic behaviour at a time and location unrelated to the actual cause - making them very hard to diagnose. Again this is true of any system, but the debug facilities in a cross-hosted embedded system are often less sophisticated that on a self-hosted system.
Yes, you can use malloc in embedded C. Some embedded systems have its own encapsulated memory allocation APIs. malloc() is the C lib API.
The memory is allocated from heap, a dedicated memory range defined by system designer. If you did not free the allocated memory after your function exits, the allocated memory is reserved and other processes cannot use it. Typically, it is memory leak. If you free the allocated memory but you still use the pointer after that, it is a wild pointer and will cause unknown behaviour.

C Static and Auto allocation

When a C program is started how does it ask the operating system enough memory space for static variables?
And while running how does it ask the operating system memory space for automatic variables?
I would also like to know how it releases these memory spaces after execution.
Please try to be the most accurate possible. If operating systems varies in their explanations please give preference to UNIX-like ones.
Global variables and those with static life time are usually stored in a data segment which is setup by the operating system's executable loader.
This loader probably does, what #John Zwinck said on Unix. On Windows there is VirtualAlloc for example, which can also be used to allocate memory in the address space of another program.
Local variables are usually stored on the so called stack. Allocations on the stack are pretty fast as they usually just consist of a modification of the stack pointer register (sp, esp, rsp on x86 processor family). So when you have an int (size: 4 bytes) that register would simply be decremented by 4 as the stack grows downwards. At the end of the scope the old state of the stack register is restored.
Also this makes stack overflows dangerous wher you can overwrite other variables on the stack that should not be modified, like return addresses of function calls.
Dynamic variables are variables allocated using malloc (C) or new (C++) or any of the operating system specific allocation functions. These are placed on the so called heap. These live until they are cleaned up using free/delete/os-specific-deallocator or the program exits (in that case a sane operating system takes care of the cleanup).
Also dynamic allocation is the slowest of the three as it requires a call to the operating system.
Memory allocation on Unix-like systems is done via calls to the operating system using the sbrk() and mmap() APIs.
sbrk() is used to enlarge the "data segment" which is a contiguous range of (virtual) addresses. mmap() is used in many modern systems as a sort of supplement to this, because it can allocate chunks which can later be deallocated independently (meaning no "holes" will remain as can happen with sbrk()).
In C you have malloc() as the user-facing API for memory allocation. You can read more about how that maps to the low-level functions I mentioned earlier, here: How are malloc and free implemented?
Static variables are found in the BSS segment behind the code. Auto variables are located on the stack at the end of the virtual memory of the process. Both are defined at compile time. The memory layout is then created at program startup.
brk(), sbrk() and mmap() can manipulate the virtual memory (the heap in particular) at run time (e.g. with malloc()/free()) but these functions are not related to static and auto variables!

malloc in an embedded system without an operating system

This query is regarding allocation of memory using malloc.
Generally what we say is malloc allocates memory from heap.
Now say I have a plain embedded system(No operating system), I have normal program loaded where I do malloc in my program.
In this case where is the memory allocated from ?
malloc() is a function that is usually implemented by the runtime-library. You are right, if you are running on top of an operating system, then malloc will sometimes (but not every time) trigger a system-call that makes the OS map some memory into your program's address space.
If your program runs without an operating system, then you can think of your program as being the operating system. You have access to all addresses, meaning you can just assign an address to a pointer, then de-reference that pointer to read/write.
Of course you have to make sure that not other parts of your program just use the same memory, so you write your own memory-manager:
To put it simply you can set-aside a range of addresses which your "memory-manager" uses to store which address-ranges are already in use (the datastructures stored in there can be as easy as a linked list or much much more complex). Then you will write a function and call it e.g. malloc() which forms the functional part of your memory-manager. It looks into the mentioned datastructure to find an address of ranges that is as long as the argument specifies and return a pointer to it.
Now, if every function in your program calls your malloc() instead of randomly writing into custom addresses you've done the first step. You can write a free()-function which will look for the pointer it is given in the mentioned datastructure, and adapts the datastructure (in the naive linked-list it would merge two links).
The only real answer is "Wherever your compiler/library-implementation puts it".
In the embedded system I use, there is no heap, since we haven't written one.
From the heap as you say. The difference is that the heap is not provided by the OS. Your application's linker script will no doubt include an allocation for the heap. The run-time library will manage this.
In the case of the Newlib C library often used in GCC based embedded systems not running an OS or at least not running Linux, the library has a stub syscall function called sbrk(). It is the respnsibility of the developer to implement sbrk(), which must provide more memory the the heap manager on request. Typically it merely increments a pointer and returns a pointer to the start of the new block, thereafter the library's heap manager manages and maintains the new block which may or may not be contiguous with previous blocks. The previous link includes an example implementation.

Resources