Memory allocation issues with the LPC1788 microcontroller - c

I'm fairly new to programming microcontrollers; I've been working with the LPC1788 for a couple of weeks now.
One problem I've been having recently is that I'm running out of memory much sooner than I expect to. I've tested how much memory seems to be available by testing how large a block of contiguous memory I can malloc, and the result is 972 bytes. Allocation is done starting at address 0x10000000 (the start of the on-chip SRAM that should be around 64kB on this board).
The program I'm working on at the moment is meant to act as a simple debugger that utilises the LCD and allows messages to be printed to it. I have one string that will constantly be "added to" by new messages, and then the whole message will be printed on the LCD. When the message's length down the screen goes past the vertical boundary, it will delete the oldest messages (the ones nearer the top) until it fits. However, I can only add about 7 additional messages before it refuses to allocate more memory. If needed, the main.c for the project is hosted at http://pastebin.com/bwUdpnD3
Earlier I also started work on a project that uses the threadX RTOS to create and execute several threads. When I tried including use of the LCD in that program, I found memory to be very limited there aswell. The LCD seems to store all pixel data starting from the SDRAM base address, but I'm not sure if that's the same thing as the SRAM I'm using.
What I need is a way to allocate memory enough to allow several threads to function or large strings to be stored, while being able to utilise the LCD. One possibility might be to use buffers or other areas of memory, but I'm not quite sure how to do that. Any help would be appreciated.
tl;dr: Quickly running out of allocatable memory on SRAM when trying to print out large strings on the LCD.
EDIT 1: A memory leak was noticed with the variable currMessage. I think that's been fixed now:
strcpy(&trimMessage[1], &currMessage[trimIndex+1]);
// Frees up the memory allocated to currMessage from last iteration
// before assigning new memory.
free(currMessage);
currMessage = malloc((msgSize - trimIndex) * sizeof(char));
for(int i=0; i < msgSize - trimIndex; i++)
{
currMessage[i] = trimMessage[i];
}
EDIT 2: Implemented memory leak fixes. Program works a lot better now, and I feel pretty stupid.

You need to be careful when choosing to use dynamic memory allocation in an embedded environment, especially with constrained memory. You can very easily end up fragmenting the memory space such that the biggest hole left is 972 bytes.
If you must allocate from the heap, do it once, and then hang onto the memory--almost like a static buffer. If possible, use a static buffer and avoid the allocation all together. If you must have dynamic allocation, keeping it to fixed sized blocks will help with the fragmentation.
Unfortunately, it does take a bit of engineering effort to overcome the fragmentation issue. It is worth the effort, and it does make the system much more robust though.
As for SRAM vs SDRAM, they aren't the same. I'm not familiar with threadX, and whether or not they have a Board Support Package (BSP) for your board, but in general, SDRAM has to be setup. That means the boot code has to initialize the memory controller, setup the timings, and then enable that space. Depending on your heap implementation, you need to dynamically add it or--more likely--you need to compile with your heap space pointing to where it will ultimately live (in SDRAM space). Then, you have to make sure to come up and get the memory controlled configured and activated before actually using the heap.
One other thing to watch out for, you may actually be running code from the SRAM space, and some of that space is also reserved for processor exception tables. That whole space may not be available, and may live through two different addresses (0x00000000 and 0x10000000) for instance. I know in some other ARM9 processors, this is common. You can boot from flash, which gets mapped initially into the 0x00000000 space, and then you do a song and dance to copy the booter into SRAM and map SRAM into that space. At that point, you can boot into something like Linux, who expects to be able to update the tables that live down at 0.
BTW, it does look like you have some memory leaks in the code you posted. Namely, currentMessage is never freed... only overwritten with a new pointer. Those blocks are then lost forever.

Related

OSDev: Why my memory allocation function suddenly stops working in the AHCI initialization function?

After my kernel calls the AHCIInit() function inside of the ArchInit() function, I get a page fault in one of the MemAllocate() calls, and this only happens in real machines, as I tried replicating it on VirtualBox, VMWare and QEMU.
I tried debugging the code, unit testing the memory allocator and removing everything from the kernel, with the exception from the memory manager and the AHCI driver itself, the only thing that I discovered is that something is corrupting the allocation blocks, making the MemAllocate() page fault.
The whole kernel source is at https://github.com/CHOSTeam/CHicago-Kernel, but the main files where the problem probably occours are:
https://github.com/CHOSTeam/CHicago-Kernel/blob/master/mm/alloc.c
https://github.com/CHOSTeam/CHicago-Kernel/blob/master/arch/x86/io/ahci.c
I expected the AHCIInit() to detect and initialize all the AHCI devices and the boot to continues until it reaches the session manager or the kernel shell, but in real computers it page faults before even initializing the scheduler (so no, the problem isn't my scheduler).
If it works in emulators but doesn't work on real hardware; then the first things I'd suspect are:
bugs in physical memory management. For example, physical memory manager initialization not rounding "starting address of usable RAM area" up to a page boundary or not rounding "ending address of usable RAM area" down to a page boundary, causing a "half usable RAM and half not usable RAM" page to be allocated by the heap later (where it works on emulators because the memory map provided by firmware happens to describe areas that are nicely aligned anyway).
a bug where RAM is assumed to contain zeros but may not be (where it works on emulators because they tend to leave almost all RAM full of zeros).
a race condition (where different timing causes different behavior).
However; this is a monolithic kernel, which means that you'll continually be facing "any piece of code in kernel-space did something that caused problems for any other piece of code anywhere else"; and there's a bunch of common bugs with memory usage (e.g. accidentally writing past the end of what you allocated). For this reason I'd want better tools to help diagnose problems, especially for heap.
Specifically, for the heap I'd start with canaries (e.g. put a magic number like 0xFEEDFACE before each block of memory in the heap, and another different number after each block of memory in the heap; and then check that the magic numbers are still present and correct where convenient - e.g. when blocks are freed or resized). Then I'd write a "check_heap()" function that scans through everything checking as much as possible (the canaries, if statistics like "number of free blocks" are actually correct, etc). The idea being that (whenever you suspect something might have corrupted the heap) you can insert a call to the "check_heap()" function, and move that call around until you find out which piece of code caused the heap corruption. I'd also suggest having a "what" parameter in your "kmalloc() or equivalent" (e.g. so you can do things like myFooStructure = kmalloc("Foo Structure", sizeof(struct foo));), where the provided "what string" is stored in the allocated block's meta-data, so that later on (when you find out the heap was corrupted) you can display the "what string" associated with the block before the corruption, and so that you can (e.g.) list how many of each type of thing there currently is to help determine what is leaking memory (e.g. if the number of "Foo Structure" blocks is continually increasing). Of course these things can be (should be?) enabled/disabled by compile time options (e.g. #ifdef DEBUG_HEAP).
The other thing I'd recommend is self tests. These are like unit tests, but built directly into the kernel itself and always present. For example, you could write code to pound the daylights out of the heap (e.g. allocate random sized pieces of memory and fill them with something until you run out of memory, then free half of them, then allocate more until you run out of memory again, etc; while calling the "check_heap()" function between each step); where this code could/should take a "how much pounding" parameter (so you could spend a small amount of time doing the self test, or a huge amount of time doing the self test). You could also write code to pound the daylights out of the virtual memory manager, and the physical memory manager (and the scheduler, and ...). Then you could decide to always do a small amount of self testing each time the kernel boots and/or provide a special kernel parameter/option to enable "extremely thorough self test mode".
Don't forget that eventually (if/when the OS is released) you'll probably have to resort to "remote debugging via. email" (e.g. where someone without any programming experience, who may not know very much English, sends you an email saying "OS not work"; and you have to try to figure out what is going wrong before the end user's "amount of hassle before giving up and not caring anymore" counter is depleted).

Read mmapped data memory efficient

I want to mmap a big file into memory and parse it sequentially. As I understand if bytes have been lazily read into memory once, they stay there. Is there a way to periodically tell the system to release the previously read contents?
This understanding is only a very superficial view.
To understand what really happens you have to take into account the difference of the virtual memory of your process and the actual real memory of the machine. Mapping a huge file means reserving space in your virtual address-space. It's probably platform-dependent if anything is already read at this point.
When you actually access the data the OS has to fill an actual page of memory. When you access other parts these parts have to be brought into memory. It's completely up to the OS when it will re-use the memory. Normally this happens when some data is accessed by you or an other process and no free memory is available. But could happen at any time. If you access it again later it might be still in memory or will brought back by the OS. No way for your process to tell the difference.
In short: You don't need to care about that. The OS manages all that in the background.
One point might be that if you map a really huge file this takes up space in your virtual address-space which is limited. So if you deal with many huge mappings and or huge allocations you might want to only map parts of the file at a given time.
ADDITION: after thinking a bit about it, I came up with a reason why it might be smarter to do it blockwise-sequential. Although I doubt you will be able to measure that.
Any reasonable OS will look for a block to unload when in need in something like the following order:
unmapped files ( not needed anymore)
LRU unmodified mapped file (can be retrieved from disc)
LRU modified mapped file (same as 2. but needs to be updated on disc before unload)
LRU allocated memory (needs to be written to swap)
So unmapping blocks known to be never used again as you go, you give the OS a hint that these should be freed earlier. This will give data that has been used less recently but might be accessed in the future a bigger chance to stay in memory.

madvise: not understood

CONTEXT:
I run on an old laptop. I only just have 128Mo ram free on 512Mo total. No money to buy more ram.
I use mmap to help me circumvent this issue and it works quite well.
C code.
Debian 64 bits.
PROBLEM:
Besides all my efforts, I am running out of memory pretty quick right know and I would like to know if I could release the mmaped regions I read to free my ram.
I read that madvise could help, especially the option MADV_SEQUENTIAL.
But I don't quite understand the whole picture.
THE NEED:
To be able to free mmaped allocated memory after the region is read so that it doesn't fill my whole ram with large files. I will not read it soon so it is garbage to me. It is pointless to keep it in ram.
Update: I am not done with the file so don't want to call munmap. I have other stuffs to do with it but in another regions of it. Random reads.
For random read/write access to a mmap()ed file, MADV_SEQUENTIAL is probably not very useful (and may in fact cause undesired behavior). MADV_RANDOM or MADV_DONTNEED would be better options in this case. However, be aware that the kernel is free to ignore any madvise() - although in my understanding, Linux currently does not, as it tends to treat madvise() more as a command than an advisory...
Another option would be to mmap() only selected sections of the file as needed, and munmap() them as you're done with them, perhaps maintaining a pool of some small number of currently active mappings (i.e. mapping more than one region at once if needed, but still keeping it limited).
Or course you must free resources when you're done with them in order not to leak them and thus run out of available space too soon.
Not sure what the question is, if you know about mmap() then surely you know about munmap() too? It's right there on the same manual page.

malloc and obtaining recently freed memory

I am allocating the array and freeing it every callback of an audio thread. The main user thread (a web browser) is constantly allocating and deallocating memory based on user input. I am sending the uninited float array to the audio card. (example in my page from my profile.) The idea is to hear program state changes.
When I call malloc(sizeof(float)*256*13) and smaller i get an array filled with a wide range of floats which have a seemingly random distribution. It is not right to call it random - presumably this comes from whatever the memory block previously held. This is the behavior I expected and want to exploit. However when I do malloc(sizeof(float)*256*14) and larger, I get an array filled only with zeros. I would like to know why this cliff exists and if theres something I can do to get around it. I know it is undefined behavior per the standard, but I'm hoping someone that knows the implementation of malloc on some system might have an explanation.
Does this mean malloc is also memsetting the block to zero for larger sizes? This would be surprising since it wouldn't be efficient. Even if there are more chunks of memory zeroed out, I'd expect something to happen sometimes, since the arrays are constantly changing.
If possible I would like to be able to obtain chunks of memory that are reallocated over recently freed memory, so any alternatives would be welcomed.
I guess this is a strange question for some because my goal is to explore undefined behavior and use bad programming practices deliberately, but this is the application I am interested in making, so please bear with the usage of uninited arrays. I know the behavior of such usage is undefined, so please bear with me and don't tell me not to do it. I'm developing on a mac 10.5.
Most likely, the larger allocations result in the heap manager directly requesting pages of virtual address space from the kernel. Freeing will return that address space back to the kernel. The kernel must zero all pages that are allocated for a process - this is to prevent data leaking from one process to another.
Smaller allocations are handled by the user-mode heap manager within the process by taking these larger page allocations from the kernel, carving them up into smaller blocks, and reusing blocks on subsequent allocations. These do not need to be zero-initialized, since the memory contents always comes from your own process.
What you'll probably find is that previous requests could be filled using smaller blocks joined together. But when you request the bigger memory, then the existing free memory probably can't handle that much and flips some inbuilt switch for a request direct from the OS.

Why does my c program not free memory as it should?

I have made a program in c and wanted to see, how much memory it uses and noticed, that the memory usage grows while normally using it (at launch time it uses about 250k and now it's at 1.5mb). afaik, I freed all the unused memory and after some time hours, the app uses less memory. Could it be possible, that the freed memory just goes from the 'active' memory to the 'wired' or something, so it's released when free space is needed?
btw. my machine runs on mac os x, if this is important.
How do you determine the memory usage? Have you tried using valgrind to locate potential memory leaks? It's really easy. Just start your application with valgrind, run it, and look at the well-structured output.
If you're looking at the memory usage from the OS, you are likely to see this behavior. Freed memory is not automatically returned to the OS, but normally stays with the process, and can be malloced later. What you see is usually the high-water mark of memory use.
As Konrad Rudolph suggested, use something that examines the memory from inside the process to look for memory links.
The C library does not usually return "small" allocations to the OS. Instead it keeps the memory around for the next time you use malloc.
However, many C libraries will release large blocks, so you could try doing a malloc of several megabytes and then freeing it.
On OSX you should be able to use MallocDebug.app if you have installed the Developer Tools from OSX (as you might have trouble finding a port of valgrind for OSX).
/Developer/Applications/PerformanceTools/MallocDebug.app
I agree with what everyone has already said, but I do want to add just a few clarifying remarks specific to os x:
First, the operating system actually allocates memory using vm_allocate which allocates entire pages at a time. Because there is a cost associated with this, like others have stated, the C library does not just deallocate the page when you return memory via free(3). Specifically, if there are other allocations within the memory page, it will not be released. Currently memory pages are 4096 bytes in mac os x. The number of bytes in a page can be determined programatically with sysctl(2) or, more easily, with getpagesize(2). You can use this information to optimize your memory usage.
Secondly, user-space applications do not wire memory. Generally the kernel wires memory for critical data structures. Wired memory is basically memory that can never be swapped out and will never generate a page fault. If, for some reason, a page fault is generated in a wired memory page, the kernel will panic and your computer will crash. If your application is increasing your computer's wired memory by a noticeable amount, it is a very bad sign. It generally means that your application is doing something that significantly grows kernel data structures, like allocating and not reaping hundreds of threads of child processes. (of course, this is a general statement... in some cases, this growth is expected, like when developing a virtual host or something like that).
In addition to what the others have already written:
malloc() allocates bigger chunks from the OS and spits it out in smaller pieces as you malloc() it. When free()ing, the piece first goes into a free-list, for quick reuse by another malloc if the size fits. It may at this time be merged with another free item, to form bigger free blocks, to avoid fragmentation (a whole bunch of different algorithms exist there, from freeLists to binary-sized-fragments to hashing and what not else).
When freed pieces arrive so that multiple fragments can be joined, free() usually does this, but sometimes, fragments remain, depending on size and orderof malloc() and free(). Also, only when a big such free block has been created will it be (sometimes) returned to the OS as a block. But usually, malloc() keeps things in its pocket, dependig on the allocated/free ratio (many heuristics and sometimes compile or flag options are often available).
Notice, that there is not ONE malloc/free algotrithm. There is a whole bunch of different implementations (and literature). Highly system, OS and library dependent.

Resources