I've been thinking for day about the following question:
In a common pc when you allocate some memory, you ask for it to the OS that keeps track of which memory segments are occupied and which ones are not, and don't let you mess around with other programs memory etc.
But what about a microcontroller, I mean a microcontroller doesn't have an operating system running so when you ask for a bunch of memory what is going on? you cannot simply acess the memory chip and acess a random place cause it may be occupied... who keeps track of which parts of memory are already occupied, and gives you a free place to store something?
EDIT:
I've programmed microcontrollers in C... and I was thinking that the answer could be "language independent". But let me be more clear: supose i have this program running on a microcontroller:
int i=0;
int d=3;
what makes sure that my i and d variables are not stored at the same place in memory?
I think the comments have already covered this...
To ask for memory means you have some operating system managing memory that you are mallocing from (using a loose sense of the term operating system). First you shouldnt be mallocing memory in a microcontroller as a general rule (I may get flamed for that statement). Can be done in cases but you are in control of your memory, you own the system with your application, asking for memory means asking yourself for it.
Unless you have reasons why you cannot statically allocate your structures or arrays or use a union if there are mutually exclusive code paths that might both want much or all of the spare memory, you can try to allocate dynamically and free but it is a harder system engineering problem to solve.
There is a difference between runtime allocation of memory and compile time. your example has nothing to do with the rest of the question
int i=0;
int d=3;
the compiler at compile time allocates two locations in .data one for each of those items. the linker and/or script manages where .data lives and what its limitations are on size, if .data is bigger than what is available you should get a linker warning, if not then you need to fix your linker commands or script to match your system.
runtime allocation is managed at runtime and where and how it manages the memory is determined by that library, even if you have plenty of memory a bad or improperly written library could overlap .text, .data, .bss and/or the stack and cause a lot of problems.
excessive use of the stack is also a pretty serious system engineering problem which coming from non-embedded systems is these days often overlooked because there is so much memory. It is a very real problem when dealing with embedded code on a microcontroller. You need to know your worst case stack usage, leave room for at least that much memory if you are going to have a heap to dynamically allocate, or even if you statically allocate.
Related
I read a lot of explanation of heap and stack memory, and all of them obscure anyway in terms of origin. First of all I understand how this memories works with software, but I don't understand the main source of this division. I assume that they are the same unspecialized physical memory, but...
For example say we have PC without any OS, and we want create some bootable program with assembly language for x86. I assume we can do this (Personally I don't know assembly, but some people write OS anyway). So the main question is Can we already operate with heap and stack, or we must create some memory managment machinery for this? If yes, so how it can be possible in terms of bare metal programming?
Adding something to the other answer, fairly correct but perhaps not very complete.
Heap and stack are two (software) ways to "manage" memory. The physical memory, normally, is a flat array of cells where a program can read and write. It is up to the running program to use those cells as it wants. But there is more to say.
1^ thing. Heap is totally software, while stack is also (or mainly) a hardware thing. Most processors have hardware (or CPU instruction) to support the stack, while most (or all?) don't care about the heap. Even more: there are small embedded processors (or microcontrollers) which have a separated stack area - totally different from other ram areas where the program could create a "heap".
2^ thing. Whean speaking about "programs", one can/should think that the operating system (the OS) is a program, specialized in managing resources (memory included), and extendable with "applications" (which are programs). In such scenario, stack and heap are managed in cooperation from both OS and the applications.
So, to reply to your main question, the 90% correct answer is: in bare metal you have already a stack - perhaps you have to issue some short instruction to set it up, but it is straightforward. But you don't have a heap, you must implement it in your program. First you set aside some memory to be used as a stack; and then you can set aside some more memory to be used as a heap, not forgetting that you must preserve some memory for normal/static data. The part of the program that manages the heap should know what to do, using but not erratically overwriting the stack and the static data, to perform its functions.
I'm trying to reconcile a few concepts.
I know of virtual memory is shared (mapped) between the kernel and all user processes, which I read here. I also know that when the compiler generates addresses for code + data, the kernel must load them at the correct virtual addresses for that process.
To constrain the scope of the question, I'll just mean gcc when I mention 'the compiler'.
So does the compiler need to be compliant each new release of an OS, to know not to place code or data at the high memory addresses reserved for the kernel? As in, someone writing that piece of the compiler must know those details of how the kernel plans to load the program (lest the compiler put executable code in high memory)?
Or am I confusing different concepts? I got a bit confused when going through this tutorial, especially at the very bottom where it has OS code in low memory addresses, because I thought Linux uses high memory for the kernel.
The compiler doesn't determine the address ranges in memory at which things are placed. That's handled by the OS.
When the program is first executed, the loader places the various portions of the program and its libraries in memory. For memory that's allocated dynamically, large chunks are allocated from the OS and then sometimes divided into smaller chunks.
The OS loader knows where to load things. And the OS's virtual memory allocation logic how to find safe, empty spaces in the address space the process uses.
I'm not sure what you mean by the "high memory addresses reserved for the kernel". If you're talking about a 2G/2G or 3G/1G split on a 32-bit operating system, that is a fundamental design element of those OSes that use it. It doesn't change with versions.
If you're talking about high physical memory, then no. Compilers don't care about physical memory.
Linux gives each application its own memory space, distinct from the kernel. The page table contains the translations between this memory space and physical RAM, and the kernel sets up the page table so there's no interference.
That said, the compiler usually doesn't even care where the program is loaded in memory. Why would it?
I would like to be able to debug how much total memory is being used by C program in a limited resource environment of 256 KB memory (currently I am testing in an emulator program).
I have the ability to print debug statements to a screen, but what method should I use to calculate how much my C program is using (including globals, local variables [from perspective of my main function loop], the program code itself etc..)?
A secondary aspect would be to display the location/ranges of specific variables as opposed to just their size.
-Edit- The CPU is Hitachi SH2, I don't have an IDE that lets me put breakpoints into the program.
Using the IDE options make the proper actions (mark a checkobx, probably) so that the build process (namely, the linker) will generate a map file.
A map file of an embedded system will normally give you the information you need in a detailed fashion: The memory segments, their sizes, how much memory is utilzed in each one, program memory, data memory, etc.. There is usually a lot of data supplied by the map file, and you might need to write a script to calculate exactly what you need, or copy it to Excel. The map file might also contain summary information for you.
The stack is a bit trickier. If the map file gives that, then there you have it. If not, you need to find it yourself. Embedded compilers usually let you define the stack location and size. Put a breakpoint in the start of you program. When the application stops there zero the entire stack. Resume the application and let it work for a while. Finally stop it and inspect the stack memory. You will see non-zero values instead of zeros. The used stack goes until the zeros part starts again.
Generally you will have different sections in mmap generated file, where data goes, like :
.intvect
.intvect_end
.rozdata
.robase
.rosdata
.rodata
.text .... and so on!!!
with other attributes like Base,Size(hex),Size(dec) etc for each section.
While at any time local variables may take up more or less space (as they go in and out of scope), they are instantiated on the stack. In a single threaded environment, the stack will be a fixed allocation known at link time. The same is true of all statically allocated data. The only run-time variable part id dynamically allocated data, but even then sich data is allocated from the heap, which in most bare-metal, single-threaded environments is a fixed link-time allocation.
Consequently all the information you need about memory allocation is probably already provided by your linker. Often (depending on your tool-chain and linker parameters used) basic information is output when the linker runs. You can usually request that a full linker map file is generated and this will give you detailed information. Some linkers can perform stack usage analysis that will give you worst case stack usage for any particular function. In a single threaded environment, the stack usage from main() will give worst case overall usage (although interrupt handlers need consideration, the linker is not thread or interrupt aware, and some architectures have separate interrupt stacks, some are shared).
Although the heap itself is typically a fixed allocation (often all the available memory after the linker has performed static allocation of stack and static data), if you are using dynamic memory allocation, it may be useful at run-time to know how much memory has been allocated from the heap, as well as information about the number of allocations, average size of allocation, and the number of free blocks and their sizes also. Because dynamic memory allocation is implemented by your system's standard library any such analysis facility will be specific to your library, and may not be provided at all. If you have the library source you could implement such facilities yourself.
In a multi-threaded environment, thread stacks may be allocated statically or from the heap, but either way the same analysis methods described above apply. For stack usage analysis, the worst-case for each thread is measured from the entry point of each thread rather than from main().
Today, I appeared for an interview and the interviewer asked me this,
Tell me the steps how will you design your own free( ) function for
deallocate the allocated memory.
How can it be more efficient than C's default free() function ? What can you conclude ?
I was confused, couldn't think of the way to design.
What do you think guys ?
EDIT : Since we need to know about how malloc() works, can you tell me the steps to write our own malloc() function
That's actually a pretty vague question, and that's probably why you got confused. Does he mean, given an existing malloc implementation, how would you go about trying to develop a more efficient way to free the underlying memory? Or was he expecting you to start discussing different kinds of malloc implementations and their benefits and problems? Did he expect you to know how virtual memory functions on the x86 architecture?
Also, by more efficient, does he mean more space efficient or more time efficient? Does free() have to be deterministic? Does it have to return as much memory to the OS as possible because it's in a low-memory, multi-tasking environment? What's our criteria here?
It's hard to say where to start with a vague question like that, other than to start asking your own questions to get clarification. After all, in order to design your own free function, you first have to know how malloc is implemented. So chances are, the question was really about whether or not you knew anything about how malloc can be implemented.
If you're not familiar with the internals of memory management, the easiest way to get started with understanding how malloc is implemented is to first write your own.
Check out this IBM DeveloperWorks article called "Inside Memory Management" for starters.
But before you can write your own malloc/free, you first need memory to allocate/free. Unfortunately, in a protected mode OS, you can't directly address the memory on the machine. So how do you get it?
You ask the OS for it. With the virtual memory features of the x86, any piece of RAM or swap memory can be mapped to a memory address by the OS. What your program sees as memory could be physically fragmented throughout the entire system, but thanks to the kernel's virtual memory manager, it all looks the same.
The kernel usually provides system calls that allow you to map in additional memory for your process. On older UNIX OS's this was usually brk/sbrk to grow heap memory onto the edge of your process or shrink it off, but a lot of systems also provide mmap/munmap to simply map a large block of heap memory in. It's only once you have access to a large, contiguous looking block of memory that you need malloc/free to manage it.
Once your process has some heap memory available to it, it's all about splitting it into chunks, with each chunk containing its own meta information about its size and position and whether or not it's allocated, and then managing those chunks. A simple list of structs, each containing some fields for meta information and a large array of bytes, could work, in which case malloc has to run through the list until if finds a large enough unallocated chunk (or chunks it can combine), and then map in more memory if it can't find a big enough chunk. Once you find a chunk, you just return a pointer to the data. free() can then use that pointer to reverse back a few bytes to the member fields that exist in the structure, which it can then modify (i.e. marking chunk.allocated = false;). If there's enough unallocated chunks at the end of your list, you can even remove them from the list and unmap or shrink that memory off your process's heap.
That's a real simple method of implementing malloc though. As you can imagine, there's a lot of possible ways of splitting your memory into chunks and then managing those chunks. There's as many ways as there are data structures and algorithms. They're all designed for different purposes too, like limiting fragmentation due to small, allocated chunks mixed with small, unallocated chunks, or ensuring that malloc and free run fast (or sometimes even more slowly, but predictably slowly). There's dlmalloc, ptmalloc, jemalloc, Hoard's malloc, and many more out there, and many of them are quite small and succinct, so don't be afraid to read them. If I remember correctly, "The C Programming Language" by Kernighan and Ritchie even uses a simple malloc implementation as one of their examples.
You can't blindly design free() without knowing how malloc() works under the hood because your implementation of free() would need to know how to manipulate the bookkeeping data and that's impossible without knowing how malloc() is implemented.
So an unswerable question could be how you would design malloc() and free() instead which is not a trivial question but you could answer it partially for example by proposing some very simple implementation of a memory pool that would not be equivalent to malloc() of course but would indicate your presence of knowledge.
One common approach when you only have access to user space (generally known as memory pool) is to get a large chunk of memory from the OS on application start-up. Your malloc needs to check which areas of the right size of that pool are still free (through some data structure) and hand out pointers to that memory. Your free needs to mark the memory as free again in the data structure and possibly needs to check for fragmentation of the pool.
The benefits are that you can do allocation in nearly constant time, the drawback is that your application consumes more memory than actually is needed.
Tell me the steps how will you design your own free( ) function for deallocate the allocated memory.
#include <stdlib.h>
#undef free
#define free(X) my_free(X)
inline void my_free(void *ptr) { }
How can it be more efficient than C's default free() function ?
It is extremely fast, requiring zero machine cycles. It also makes use-after-free bugs go away. It's a very useful free function for use in programs which are instantiated as short-lived batch processes; it can usefully be deployed in some production situations.
What can you conclude ?
I really want this job, but in another company.
Memory usage patterns could be a factor. A default implementation of free can't assume anything about how often you allocate/deallocate and what sizes you allocate when you do.
For example, if you frequently allocate and deallocate objects that are of similar size, you could gain speed, memory efficiency, and reduced fragmentation by using a memory pool.
EDIT: as sharptooth noted, only makes sense to design free and malloc together. So the first thing would be to figure out how malloc is implemented.
malloc and free only have a meaning if your app is to work on top of an OS. If you would like to write your own memory management functions you would have to know how to request the memory from that specific OS or you could reserve the heap memory right away using existing malloc and then use your own functions to distribute/redistribute the allocated memory through out your app
There is an architecture that malloc and free are supposed to adhere to -- essentially a class architecture permitting different strategies to coexist. Then the version of free that is executed corresponds to the version of malloc used.
However, I'm not sure how often this architecture is observed.
The knowledge of working of malloc() is necessary to implement free(). You can find a implementation of malloc() and free() using the sbrk() system call in K&R The C Programming Language Chapter 8, Section 8.7 "Example--A Storage Allocator" pp.185-189.
I am writing C for an MPC 555 board and need to figure out how to allocate dynamic memory without using malloc.
Typically malloc() is implemented on Unix using sbrk() or mmap(). (If you use the latter, you want to use the MAP_ANON flag.)
If you're targetting Windows, VirtualAlloc may help. (More or less functionally equivalent to anonymous mmap().)
Update: Didn't realize you weren't running under a full OS, I somehow got the impression instead that this might be a homework assignment running on top of a Unix system or something...
If you are doing embedded work and you don't have a malloc(), I think you should find some memory range that it's OK for you to write on, and write your own malloc(). Or take someone else's.
Pretty much the standard one that everybody borrows from was written by Doug Lea at SUNY Oswego. For example glibc's malloc is based on this. See: malloc.c, malloc.h.
You might want to check out Ralph Hempel's Embedded Memory Manager.
If your runtime doesn't support malloc, you can find an open source malloc and tweak it to manage a chunk of memory yourself.
malloc() is an abstraction that is use to allow C programs to allocate memory without having to understand details about how memory is actually allocated from the operating system. If you can't use malloc, then you have no choice other than to use whatever facilities for memory allocation that are provided by your operating system.
If you have no operating system, then you must have full control over the layout of memory. At that point for simple systems the easiest solution is to just make everything static and/or global, for more complex systems, you will want to reserve some portion of memory for a heap allocator and then write (or borrow) some code that use that memory to implement malloc.
An answer really depends on why you might need to dynamically allocate memory. What is the system doing that it needs to allocate memory yet cannot use a static buffer? The answer to that question will guide your requirements in managing memory. From there, you can determine which data structure you want to use to manage your memory.
For example, a friend of mine wrote a thing like a video game, which rendered video in scan-lines to the screen. That team determined that memory would be allocated for each scan-line, but there was a specific limit to how many bytes that could be for any given scene. After rendering each scan-line, all the temporary objects allocated during that rendering were freed.
To avoid the possibility of memory leaks and for performance reasons (this was in the 90's and computers were slower then), they took the following approach: They pre-allocated a buffer which was large enough to satisfy all the allocations for a scan-line, according to the scene parameters which determined the maximum size needed. At the beginning of each scan-line, a global pointer was set to the beginning of the scan line. As each object was allocated from this buffer, the global pointer value was returned, and the pointer was advanced to the next machine-word-aligned position following the allocated amount of bytes. (This alignment padding was including in the original calculation of buffer size, and in the 90's was four bytes but should now be 16 bytes on some machinery.) At the end of each scan-line, the global pointer was reset to the beginning of the buffer.
In "debug" builds, there were two scan buffers, which were protected using virtual memory protection during alternating scan lines. This method detects stale pointers being used from one scan-line to the next.
The buffer of scan-line memory may be called a "pool" or "arena" depending on whome you ask. The relevant detail is that this is a very simple data structure which manages memory for a certain task. It is not a general memory manager (or, properly, "free store implementation") such as malloc, which might be what you are asking for.
Your application may require a different data structure to keep track of your free storage. What is your application?
You should explain why you can't use malloc(), as there might be different solutions for different reasons, and there are several reasons why it might be forbidden or unavailable on small/embedded systems:
concern over memory fragmentation. In this case a set of routines that allocate fixed size memory blocks for one or more pools of memory might be the solution.
the runtime doesn't provide a malloc() - I think most modern toolsets for embedded systems do provide some way to link in a malloc() implementation, but maybe you're using one that doesn't for whatever reason. In that case, using Doug Lea's public domain malloc might be a good choice, but it might be too large for your system (I'm not familiar with the MPC 555 off the top of my head). If that's the case, a very simple, custom malloc() facility might be in order. It's not too hard to write, but make sure you unit test the hell out of uit because it's also easy to get details wrong. For example, I have a set of very small routines that use a brain dead memory allocation strategy using blocks on a free list (the allocator can be compile-time configured for first, best or last fit). I give it an array of char at initialization, and subsequent allocation calls will split free blocks as necessary. It's nowhere near as sophisticated as Lea's malloc(), but it's pretty dang small so for simple uses it'll do the trick.
many embedded projects forbid the use of dynamic memory allocation - in this case, you have to live with statically allocated structures
Write your own. Since your allocator will probably be specialized to a few types of objects, I recommend the Quick Fit scheme developed by Bill Wulf and Charles Weinstock. (I have not been able to find a free copy of this paper, but many people have access to the ACM digital library.) The paper is short, easy to read, and well suited to your problem.
If you turn out to need a more general allocator, the best guide I have found on the topic of programming on machines with fixed memory is Donald Knuth's book The Art of Computer Programming, Volume 1. If you want examples, you can find good ones in Don's epic book-length treatment of the source code of TeX, TeX: The Program.
Finally, the undergraduate textbook by Bryant and O'Hallaron is rather expensive, but it goes through the implementation of malloc in excruciating detail.
Write your own. Preallocate a big chunk of static RAM, then write some functions to grab and release chunks of it. That's the spirit of what malloc() does, except that it asks the OS to allocate and deallocate memory pages dynamically.
There are a multitude of ways of keeping track of what is allocated and what is not (bitmaps, used/free linked lists, binary trees, etc.). You should be able to find many references with a few choice Google searches.
malloc() and its related functions are the only game in town. You can, of course, roll your own memory management system in whatever way you choose.
If there are issues allocating dynamic memory from the heap, you can try allocating memory from the stack using alloca(). The usual caveats apply:
The memory is gone when you return.
The amount of memory you can allocate is dependent on the maximum size of your stack.
You might be interested in: liballoc
It's a simple, easy-to-implement malloc/free/calloc/realloc replacement which works.
If you know beforehand or can figure out the available memory regions on your device, you can also use their libbmmm to manage these large memory blocks and provide a backing-store for liballoc. They are BSD licensed and free.
FreeRTOS contains 3 examples implementations of memory allocation (including malloc()) to achieve different optimizations and use cases appropriate for small embedded systems (AVR, ARM, etc). See the FreeRTOS manual for more information.
I don't see a port for the MPC555, but it shouldn't be difficult to adapt the code to your needs.
If the library supplied with your compiler does not provide malloc, then it probably has no concept of a heap.
A heap (at least in an OS-less system) is simply an area of memory reserved for dynamic memory allocation. You can reserve such an area simply by creating a suitably sized statically allocated array and then providing an interface to provide contiguous chunks of this array on demand and to manage chunks in use and returned to the heap.
A somewhat neater method is to have the linker allocate the heap from whatever memory remains after stack and static memory allocation. That way the heap is always automatically as large as it possibly can be, allowing you to use all available memory simply. This will require modification of the application's linker script. Linker scripts are specific to the particular toolchain, and invariable somewhat arcane.
K&R included a simple implementation of malloc for example.