Use of memory pools in C (Embedded systems) - c

I'm writing a piece of code for an embedded system (mcu Cortex-M4 with very limited ram < 32KB) where I need big chunks of memory for a very small and defined time and I thought to use some kind of a memory pool so the memory won't get wasted for functions and actions that run once or twice a lifetime.
I made some efforts but I think I'd like to learn more about memory pools and rewrite my code better.
I saw some examples where a pointer is used to point to the next available free chunk, but I don't understand how I can process those kind of pools.
For example, I can't use strstr in a memory pool where the total string would be spread into more than one chunk. I would need to read chunk by chunk and store the total string into one larger array to carry on further process. Please correct me if I'm wrong.
So, if I get it right, if I have a memory pool of 1024 bytes with 32 bytes long for each chunk that gives us 32 chunks in total. And if I want to store a string of total length let's say 256 chars (bytes) I'd need 8 chunks but if I want to read the string I'd need to copy those 8 chunks into a 256 chars array.
Am I missing something?

Related

How to handle memory management via mmap properly?

I'm trying to write my own malloc and free implementation for the sake of learning, with just mmap and munmap (since brk and sbrk are obsoletes). I've read a fair amount of documentation on the subject, but every example I see either use sbrk or doesn't explain very well how to handle large zones of mapped memory.
The idea of what I'm trying to write is this: I first map a big zone (i.e. 512 pages); this zone will contains all allocations between 1 and 992 bytes, in 16 bytes increments. I'll do the same later with a 4096 pages zone for bigger allocations (or mmap directly if the requested size is bigger than a page). So I need a way to store informations about every chunk that I allocate or free.
My question is, how do I handle these informations properly ?
My problematics are: If I create a linked list, how do I allocate more space for each node ? Or do I need to copy it to the mapped zone ? If so, how can I juggle between data space and reserved space ? Or is it better to use a static sized array ? Problem with this is that my zone's size depends on the page size.
There are several possible implementations for a mmap-based malloc:
Sequential (first-fit, best-fit).
Idea: Use a linked list with the last chunk sized to the remaining size of your page.
struct chunk
{
size_t size;
struct chunk *next;
int is_free;
}
To allocate
Iterate your list for a suitable free chunk (optimizable)
If nothing's found, resize the last chunk to the required size and create a free chunk to the remaining size.
If you reach the end of the page, (the size is too small, and next is NULL), simply mmap a new page (optimisable: generate a custom page if the size is abnormal ...)
To free, even simpler: simply set is_free to 1. optionally, you can check if the next chunk is also free and merge both in a bigger free chunk (watch out for page borders).
Pros: Easy to implement, trivial to understand, simple to tweak.
Cons: not very efficient (iterate your whole list to find a block?), need lots of optimisation, hectic memory organization
Binary buddies (I love binary arithmetics and recursion)
Idea: Use powers-of-2 as size units:
struct chunk
{
size_t size;
int is_free;
}
the structure here does not need a next as you'll see.
The principle is the following:
You have a 4096-bytes page. that is (-16 for metadata) 4080 usable bytes
To allocate a small block, simply split up the page in two 2048-bytes chunks, and split again the first half in 1028-bytes chunks... until you get a suitable usable space (minimum at 32-bytes (16 usable)).
Every block, if it isn't a full page, has a buddy.
You end up with a tree-like structure like this:
to access your buddy, use a binary XOR between your pointer and your block size.
Implementation:
Allocating a block of size Size
Get the required Block_size = 2^k > size + sizeof(chunk)
find the smallest free space in the tree that fits block_size
If it can get smaller, Split it, recursively.
Freeing a block
Setting is_free to 1
checking if your buddy is free (XOR size, don't forget to verify he's the same size as you)
if he is, double his size. Recurse.
Pros: Extremely fast and memory-efficient, clean.
Cons: Complicated, a few tricky cases (page borders and buddy sizes)
Need to keep a list of your pages
Buckets (I have a lot of time to lose)
This is the only method of the three I have not attempted to implement myself, so I can only speak of the Theory:
struct bucket
{
size_t buck_num; //number of data segment
size_t buck_size; //size of a data segment
void *page;
void *freeinfo;
}
You have from the start a few small pages, each split in blocks of constant size (one 8-bytes page, one 16-bytes, one 32-bytes and so on)
The "freedom information" of those data buckets are stored in bitsets (structures representing a large set of ints) either at the start of each page, or in a separate memory zone.
for example, for a 512-bytes bucket in a 4096 bytes pages, the bitset representing it would be a 8-bit bitset,
supposing *freeinfo = 01001000, this would mean the second and fifth buckets are free.
Pros: By far the fastest and cleanest over the long run,
Most efficient on many small allocations
Cons: Very cumbersome to implement, quite heavy for a small program, need for a separate memory space for bitsets.
There are probably other algorithms and implementations but those three are the most used, So I hope you can get a lead on what you want to do from this.

I need to increase the Maximum possible array size

I have a 4GB Ram installed on Coure2Duo PC with a 32bit Windows 7 Operating system. I have increased the paging size up to 106110MB. But after doing all this I am not able to significantly increase the maximum array size.
Following are the specs
memory
Maximum possible array: 129 MB (1.348e+08 bytes) *
Memory available for all arrays: 732 MB (7.673e+08 bytes) **
Memory used by MATLAB: 563 MB (5.899e+08 bytes)
Physical Memory (RAM): 3549 MB (3.722e+09 bytes)
* Limited by contiguous virtual address space available.
** Limited by virtual address space available.
Kindly help me on your earliest. I am not even able to read a file of 48+MB size in double format.
There are two things you can do to clear up memory for MATLAB. Since you're using a 32-bit version of the program, you're normally limited to 2GB of memory. Using the /3GB switch while opening the program makes an additional 1GB of RAM available to that program.
Second, you should consider using the pack() function, which rearranges variables in RAM to free up more contiguous memory space. This, more than anything, is affecting your ability to open individual arrays.
Remember: you can figure out how many items an array will hold by dividing the memory amount available by the size of the variable type. Double variables take up 8 bytes each. Your 129MB of space available should allow around 16.85 million double values in a single array.
You can view information about memory usage using the memory functions included in MATLAB.
memory shows the memory information
inmem will show you the variables and functions stored in memory
clear will allow you to clear the memory of specific variables or functions.
You may try to set the 3GB switch, maybe this increases the possible memory. Otherwise: Switch to a 64 bit os. Your system wastes 547MB of RAM simply because there are no addresses for it.

Memory allocator for small chunks (Typ <= 16 bytes, Rare >= 64 Bytes, Max = 192) with static heap

This allocator will be used inside an embedded system with static memory (ie, no system heap available, so the 'heap' will simply be 'char heap[4096]')
There seems to be lots of "small memory allocators" around, but I'm looking for one that handles REALLY small allocations. I'm talking typical sizes of 16 bytes with small CPU use and smaller memory use.
Considering the typical allocation sizes are <= 16 bytes, Rare allocations being <= 64 bytes, and the "one in a million" allocations being upto 192 bytes, I would thinking of simply chopping those 4096 bytes into 255 pages of 16 bytes each and having a bitmap and "next free chunk" pointer. So rather than searching, if the memory is available, the appropriate chunks are marked and the function returns the pointer. Only once the end is reached would it go searching for an appropriate slot of required size. Due to the nature of the system, earlier blocks 'should' have been released by the time the 'Next free chunk' arrives at the end of the 'heap'.
So,
Does anyone know something like this already exists?
If not, can anyone poke holes in my theory?
Or, can they suggest something better?
C only, no C++. (Entire application must fit into <= 64KB, and there's about 40K of graphics so far...)
OP: can anyone poke holes in my theory?
In reading the first half, I thought out a solution using a bit array to record usage and came up with effectively the same thing you outline in the 2nd half.
So here is the hole: avoid hard coding a 16-bite block. Allow your bit map to work with, say 20 or 24 byte blocks at the beginning of you development. During this time, you may want to put tag information and sentinels on the edges of the block. Thus you can more readily track down double free(), usage outside allocation, etc. Of course, the price is a smaller effective pool.
After your debug stage, go with your 16-byte solution with confidence.
Be sure to keep track of 0 <= total allocation <= (2048 - overhead) and allow a check of it versus your bitmap.
For debug, consider filling a freed block with "0xDEAD", etc. to help force inadvertent free usage errors.

Large array declaration with gcc and its problems

I was writing a code which requires a large 'int' array to be allocated (size of 10^9).
While doing so i faced several issues and after reading stuff on Google i came to following conclusions of my own. Can someone see this and point out if i am missing some thing and also suggest a better way to do this.
(Machine config: VM machine Ubuntu 10.4,gcc 4.4.3 , 32bit, 2GB ram(though my host machine as 6gigs)
1.I declared the array as 'unsigned long int' with size 1*10^9. It didn't worked as on compiling the code i got the error 'array size too long'.
So i searched for this and finally realized that i cant allocate that much memory on stack as my physical memory was 2 GB.( i had already tried allocating the array as global variable which would allocate them in global area instead of stack but the same error)
So i tried allocating the same amount of memory using 'malloc' but again got the error with 'malloc' this time 'Cannot alllocate memory'.
So after doing all this my understanding/problems are as follows:
3- I can't allocate that much memory be it stack or heap as my physical mem is only 2Gb ( so this is the actual problem or some other factors also govern this mem allocation ??)
4- Is there any possible workaround where i can allocate a memory of size 10^9 on a 2gig machine( I know allocating a array or mem area this much big is neither good algo design nor efficient but i just want know the limits.)
5- any better solution for allocating this much memory ( i mean should i use 2 small arrays/heap mem instead of one big chunk)
(NOTE:Point 4 and 5 are two different approaches i would appreciate suggestion for both the approaches)
Many thanks
P.S forgive me if i am being novice ..
You are compiling a 32 bit process and there is simply not enough physical address space for your huge data block. A 32 bit pointer can hold 2^32 distinct values, i.e. 4GB. You can't allocate more than that because you would have no way to refer to the memory. Each byte of memory that is mapped into your process must have a unique address.
So, nothing is going to fit your data into a 4GB address space. Even if your array was less than 4GB you may have problems allocating a single contiguous block of memory.
You could use a 64 bit process but you'd need to make sure you had enough physical memory to avoid disk thrashing when your array was swapped. Or you could find a different algorithm that did not require such a huge block of memory.

Determining size of bit vectors for memory management given hard limit on memory

After searching around a bit and consulting the Dinosaur Book, I've come to SO seeking wisdom. Note that this is somewhat homework-related, but actually isn't a homework problem. Also, this is using the C programming language.
I'm working with a kernel that currently allocates memory in 4K chunks. In an attempt to cut down on wasted memory, I've rewritten what I think malloc to be like, where it'll grab a 4K page and then give out memory from that, as needed. That part is currently working fine. I plan to have a linked list of pages of memory. Memory is handled as a char*, so my struct has a char* in it. It also currently has some ints describing it, as well as a pointer to the next node.
The question is this: I plan to use a bit vector to keep track of free and used memory. I want to figure out how many integers (4 bytes, 32 bits) I need to keep track of all the 1 byte blocks in the page of memory. So 1 bit in the bit vector will correspond to 1 byte in the page. The catch is that I must fit this all within the 4K I've been allocated, so I need to figure out the number of integers necessary to satisfy the 1-bit-per-byte constraint and fit in the 4K.
Or rather, I need to maximize the "actual" memory I'll have, while minimizing the number of integers required to map one bit per byte, while both parts ("actual" memory and bit vector) are in the same page.
Due to information about the page, and the pointers, I won't actually have 4K to play with, but something closer to 4062 bytes.
I believe this to be a linear programming problem, but the formulations of it that I've tried haven't worked out.
You want to use a bitmap to keep track of allocated bytes in a 4k page? And are wondering how to figure out how big the bitmap should be (in bytes)? The answer is 456 (after rounding), found by solving this equation:
X + X/8 = 4096
which reduces to:
9X = 32768
But ... the whole approach of keeping a bitmap within each allocated page sounds very wrong to me. What if you want to allocate a 12k buffer?

Resources