"Out of Memory" after adding an array in Kernel

"Out of Memory" after adding an array in Kernel - c

I'm trying to trace get_page_from_freelist().
I defined a pointer of integer and initialized it with kmalloc() in mm/init_mm.c, and added some system calls to control it. But after this, I rebooted my computer and saw the "Out of Memory" error.
I reduced the array size to 4KB(512 entries), but it still shows the same error message. As far as I know, 4KB size is very small to kernel. How can I handle this problem?
My kernel version is 5.19.9 and I have 32GB physical memory. I'm using 64bit ubuntu 22.04
from mm/init-mm.c:
int trace_on;
int trace_idx;
int trace_mod;
int raw_trace[TRACE_SIZE][2];
void setup_initial_init_mm(void *start_code, void *end_code,
void *end_data, void *brk)
{
int i;
trace_on = 0;
trace_idx = 0;
trace_mod = 0;
for (i = 0; i < TRACE_SIZE; i++) {
raw_trace[I][trace_mod] = 0;
}
init_mm.start_code = (unsigned long)start_code;
init_mm.end_code = (unsigned long)end_code;
init_mm.end_data = (unsigned long)end_data;
init_mm.brk = (unsigned long)brk;
}
from include/linux/mm_types.h:
#define TRACE_SIZE 0x100
extern int trace_on;
extern int trace_idx;
extern int trace_mod;
extern int raw_trace[TRACE_SIZE][2];
from mm/page_alloc.c:
#include <linux/mm_types.h>
extern int trace_on;
extern int trace_idx;
extern int trace_mod;
extern int raw_trace[TRACE_SIZE][2];
static struct page *
get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
const struct alloc_context *ac)
{
struct zoneref *z;
struct zone *zone;
struct pglist_data *last_pgdat_dirty_limit = NULL;
bool no_fallback;
int i;
if (unlikely(!trace_on && trace_idx > 0)) {
if (unlikely(trace_idx == TRACE_SIZE))
trace_idx--;
for (i = 0; i <= trace_idx; i++) {
raw_trace[i][0] = 0;
raw_trace[i][1] = 0;
}
trace_idx = 0;
trace_mod = 0;
}
retry:
...
try_this_zone:
page = rmqueue(ac->preferred_zoneref->zone, zone, order,
gfp_mask, alloc_flags, ac->migratetype);
if (page) {
prep_new_page(page, order, gfp_mask, alloc_flags);
/*
* If this is a high-order atomic allocation then check
* if the pageblock should be reserved for the future
*/
if (unlikely(order && (alloc_flags & ALLOC_HARDER)))
reserve_highatomic_pageblock(page, zone, order);
if (unlikely(trace_on)) {
if (unlikely(trace_idx >= TRACE_SIZE)) {
if (trace_mod) trace_mod = 0;
else trace_mod = 1;
trace_idx = 0;
}
raw_trace[trace_idx++][trace_mod] = page_to_phys(page);
}
...
I think the array size or using kmalloc() is not the issue, because I've already try to reduce its size and allocate it statically.
The code above is the static allocation version of my modification.
Originally, I allocated it a buffer with kmalloc() in setup_initial_init_mm().

Related

Why does LC_SYMTAB have invalid stroff/strsize but only for some loaded images?

I wrote the below program to iterate over all images in memory and dump their string tables.
#include <mach-o/dyld.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char** argv) {
uint32_t count = _dyld_image_count();
for (uint32_t i = 0 ; i < count ; i++) {
const char* imageName = _dyld_get_image_name(i);
printf("IMAGE[%u]=%s\n", i, imageName);
const struct mach_header* header = _dyld_get_image_header(i);
if (header->magic != MH_MAGIC_64)
continue;
struct mach_header_64* header64 = (struct mach_header_64*)header;
char *ptr = ((void*)header64) + sizeof(struct mach_header_64);
for (uint32_t j = 0; j < header64->ncmds; j++) {
struct load_command *lc = (struct load_command *)ptr;
ptr += lc->cmdsize;
if (lc->cmd != LC_SYMTAB)
continue;
struct symtab_command* symtab = (struct symtab_command*)lc;
printf("\t\tLC_SYMTAB.stroff=%u\n", symtab->stroff);
printf("\t\tLC_SYMTAB.strsize=%u\n", symtab->strsize);
if (symtab->strsize > 100*1024*1024) {
printf("\t\tHUH? Don't believe string table is over 100MiB in size!\n");
continue;
}
char *strtab = (((void*)header64) + symtab->stroff);
uint32_t off = 0;
while (off < symtab->strsize) {
char *e = &(strtab[off]);
if (e[0] != 0)
printf("\t\tSTR[%u]=\"%s\"\n", off, e);
off += strlen(e) + 1;
}
}
}
return 0;
}
It seems to randomly work for some images, but for others the stroff/strsize have nonsensical values:
LC_SYMTAB.stroff=1266154560
LC_SYMTAB.strsize=143767728
It seems to always be the same two magic values, but I'm not sure if this is system-dependent in some way or if other people will get the same specific values.
If I comment out the check for strsize being over 100MiB, then printing the string table segfaults.
Most images seem to have this problem, but some don't. When I run it, I get the issue for 29 images out of 38.
I can't observe any pattern as to which do and which won't. What is going on here?
If it is relevant, I am testing on macOS 10.14.6 and compiling with Apple LLVM version 10.0.1 (clang-1001.0.46.4).

As you already worked out, those are from the dyld_shared_cache. And the 0x80000000 flag is indeed documented, in the headers shipped with Xcode or any semi-recent XNU source:
#define MH_DYLIB_IN_CACHE 0x80000000 /* Only for use on dylibs. When this bit
is set, the dylib is part of the dyld
shared cache, rather than loose in
the filesystem. */
As you've also discovered, the stroff/strsize values do not yield usable results when added to the dyld_shared_cache base. That is because those are not memory offsets, but file offsets. This is true for all Mach-O's, it's just often the case that the segments of non-cached binaries have the same relative position in file and memory offsets. But this is definitely not true for the shared cache.
To translate the file offset into a memory address, you'll have to parse the segments in the shared cache header. You can find struct definitions in the dyld source.

Here's a program which prints out the contents of the string table of the dyld shared cache.
My original program in the question can be enhanced to skip dumping string table of images with MH_DYLIB_IN_CACHE set, and combined with this program to dump the shared cache string table. (All images in the shared cache share the same string table.)
#include <mach-o/dyld.h>
#include <stdio.h>
#include <string.h>
#include <inttypes.h>
const void* _dyld_get_shared_cache_range(size_t* cacheLen);
struct dyld_cache_header {
char magic[16];
uint32_t mappingOffset;
uint32_t mappingCount;
// Omitted remaining fields, not relevant to this task
};
struct dyld_cache_mapping_info {
uint64_t address;
uint64_t size;
uint64_t fileOffset;
uint32_t maxProt;
uint32_t initProt;
};
#ifndef MH_DYLIB_IN_CACHE
# define MH_DYLIB_IN_CACHE 0x80000000
#endif
// Finds first shared cache DYLD image. Any will do, just grab the first
const struct mach_header_64* findSharedCacheDyldImage(void) {
uint32_t count = _dyld_image_count();
for (uint32_t i = 0 ; i < count ; i++) {
const struct mach_header* header = _dyld_get_image_header(i);
if (header->magic != MH_MAGIC_64)
continue;
const struct mach_header_64* header64 = (const struct mach_header_64*)header;
if (!(header64->flags & MH_DYLIB_IN_CACHE))
continue;
return header64;
}
return NULL;
}
// Find first instance of given load command in image
const struct load_command* findFirstLoadCommand(const struct mach_header_64* header64, uint32_t cmd) {
const char *ptr = ((void*)header64) + sizeof(struct mach_header_64);
for (uint32_t j = 0; j < header64->ncmds; j++) {
const struct load_command *lc = (const struct load_command *)ptr;
ptr += lc->cmdsize;
if (lc->cmd == cmd)
return lc;
}
return NULL;
}
// Translates a shared cache file offset to a memory address
void *translateOffset(const struct dyld_cache_header *cache, uint64_t offset) {
const struct dyld_cache_mapping_info* mappings = (struct dyld_cache_mapping_info*)(((void*)cache) + cache->mappingOffset);
for (uint32_t i = 0; i < cache->mappingCount; i++) {
if (offset < mappings[i].fileOffset) continue;
if (offset >= (mappings[i].fileOffset + mappings[i].size)) continue;
return (void*)(mappings[i].address - mappings[0].address + (offset - mappings[i].fileOffset) + (uint64_t)cache);
}
return NULL;
}
int main(int argc, char** argv) {
size_t cacheLen;
const struct dyld_cache_header *cache = _dyld_get_shared_cache_range(&cacheLen);
const struct mach_header_64* sharedCacheDyldImage = findSharedCacheDyldImage();
const struct symtab_command* symtab = (const struct symtab_command*)findFirstLoadCommand(sharedCacheDyldImage,LC_SYMTAB);
const void *stringTbl = translateOffset(cache, symtab->stroff);
uint32_t off = 0;
while (off < symtab->strsize) {
const char *e = &(stringTbl[off]);
if (e[0] != 0)
printf("STR[%u]=\"%s\"\n", off, e);
off += strlen(e) + 1;
}
return 0;
}

Kernel module linked list strange behaviour

I am creating a linked list in kernel module ; It has no problem with adding the first record , in init module i am initializing the map also with create map function ;
typedef struct IP4SYN{
unsigned int KaynakIP;
u_int16_t KaynakPort;
u_int16_t HedefPort;
} IP4SYN;
typedef struct IP4Map{
unsigned int HedefIP;
IP4SYN * a;
size_t s;
size_t n;
} IP4Map;
typedef struct Sessions{
IP4Map * a;
size_t s;
size_t n;
} Sessions;
static size_t blocksize = (1024*102);
Sessions * Session_Map;
static unsigned int get_tsval(void){
return (unsigned int)(ktime_to_ns(ktime_get())>>10);
}
Sessions * CreateMap(size_t initial){
Sessions * pMap = (Sessions *)vmalloc(sizeof(Sessions));
if (initial > 0 && blocksize != initial) {
blocksize = initial;
}
pMap->a = (IP4Map *)vmalloc(sizeof(IP4Map)*blocksize);
memset(pMap->a, 0, sizeof(IP4Map)*blocksize);
pMap->s = blocksize;
pMap->n = 0;
pMap->a->a=(IP4SYN *)vmalloc(sizeof(IP4SYN)*blocksize);
memset(pMap->a->a, 0, sizeof(IP4SYN)*blocksize);
pMap->a->s = blocksize;
pMap->a->n = 0;
return pMap;
}
size_t HedefIPKayitNo(unsigned int HedefIP){
size_t beg = 0;
size_t end = Session_Map->n;
for (beg; beg < end; beg) {
IP4Map map = Session_Map->a[beg];
if(map.HedefIP==HedefIP){
return beg;
break;
}
}
return beg;
}
size_t YeniHedefIP(unsigned int HedefIP,size_t index){
Session_Map->a[index].HedefIP = HedefIP;
Session_Map->n++;
}
I want to check if ip exist in listed and if not listed i want to add it to the list. But strangely it causing lack of kernel on 2nd try . 1st add-up is working perfect. what should be the issue ?
size_t IP_Index = HedefIPKayitNo(hedef_ip);
if(IP_Index == Session_Map->n ){
YeniHedefIP(hedef_ip,(IP_Index));
}

C VS2010 - Heap corruption freeing array of pointer on struct

Hello world (hi people),
First, I'd say this is my first post, so please be clement.
As the title says, I've a heap corruption when I wants to free my object(s). I passed a couple of hours trying to fix it but I just can't see what's wrong even though I'm sure it's obvious!
So that's why I come to you.
My goal is to create some functions to (poorly) mimic some std::vector function in C. All objects are dynamically created. The implementation may be tricky due to the massive use of pointers.
I got the heap corruption when I free the vector object but I don't know if it comes from the pushBack or the destroy function.
Feel free to ask more information. Constructive comments are welcome!
Here is some piece of code:
Header content:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
typedef struct _array
{
unsigned int size;
int *pData;
} tArray, *ptArray;
typedef struct _vector
{
unsigned int size;
ptArray *pData;
} tVector, *ptVector;
ptArray createArray(unsigned int size);
void destroyArray(ptArray *pArray);
ptVector createVector();
int pushBackArray(ptVector pVector, ptArray pArrayToAppend);
int popBackArray(ptVector pVector);
void destroyVector(ptVector *pVector);
Main content:
int main(char argc, char *argv[])
{
unsigned int size = 5, i;
time_t seed = NULL;
ptArray pArr = createArray(size);
ptVector pVec = createVector();
srand(seed);
for(i = 0; i < pArr->size; i++)
{
pArr->pData[i] = rand() % 9 + 1;
}
//destroyVector(&pVec); // Works at this point
pushBackArray(pVec, pArr);
destroyArray(&pArr);
destroyVector(&pVec); // Heap corruption on free pVector->pData
getchar();
return 0;
}
Body content:
ptArray createArray(unsigned int size)
{
ptArray pArray = (ptArray)calloc(1, sizeof(tArray));
if(pArray)
{
pArray->pData = (int*)malloc(size * sizeof(int));
if(pArray->pData)
pArray->size = size;
}
return pArray;
}
void destroyArray(ptArray *pArray)
{
if(pArray)
{
free((*pArray)->pData);
free((*pArray));
(*pArray) = NULL;
}
}
ptVector createVector()
{
ptVector pVector = (ptVector)calloc(1, sizeof(*pVector));
return pVector;
}
int pushBackArray(ptVector pVector, ptArray pArrayToAppend)
{
int res = 0;
if(pVector && pArrayToAppend)
{
pVector->pData = (ptArray*) realloc(pVector->pData
, sizeof(ptArray) * pVector->size + 1);
if(pVector->pData)
{
pVector->pData[pVector->size] = createArray(pArrayToAppend->size);
if(pVector->pData[pVector->size])
{
memcpy(pVector->pData[pVector->size]->pData
, pArrayToAppend->pData
, pVector->pData[pVector->size]->size * sizeof(int));
pVector->size++;
}
else
{
pVector->pData = (ptArray*) realloc(pVector->pData
, sizeof(ptArray) * pVector->size);
res = 3;
}
}
else
res = 2;
}
else
res = 1;
return res;
}
void destroyVector(ptVector *pVector)
{
if(pVector)
{
unsigned int i;
for(i = 0; i < (*pVector)->size; i++)
destroyArray( &((*pVector)->pData[i]) );
free((*pVector)->pData); // Heap Corruption throw
free(*pVector);
(*pVector) = NULL;
}
}

specifying the size is incorrect at function pushBackArray
sizeof(ptArray) * pVector->size + 1
to
sizeof(ptArray) * (pVector->size + 1)

int queue with compare and swap has race condition

I have written a synchronised queue for holding integers and am faced with a weird race condition which I cannot seem to be able to understand.
Please do NOT post solutions, I know how to fix the code and make it work, I want to know what the race condition is and why it is not working as intended. Please help me understand what is going wrong and why.
First the important part of the code:
This assumes that the application will never put in more then the buffer can hold, thus no check for the current buffer size
static inline void int_queue_put_sync(struct int_queue_s * const __restrict int_queue, const long int value ) {
if (value) { // 0 values are not allowed to be put in
size_t write_offset; // holds a current copy of the array index where to put the element
for (;;) {
// retrieve up to date write_offset copy and apply power-of-two modulus
write_offset = int_queue->write_offset & int_queue->modulus;
// if that cell currently holds 0 (thus is empty)
if (!int_queue->int_container[write_offset])
// Appetmt to compare and swap the new value in
if (__sync_bool_compare_and_swap(&(int_queue->int_container[write_offset]), (long int)0, value))
// if successful then this thread was the first do do this, terminate the loop, else try again
break;
}
// increment write offset signaling other threads where the next free cell is
int_queue->write_offset++;
// doing a synchronised increment here does not fix the race condition
}
}
This seems to have a rare race condition which seems to not increment the write_offset.
Tested on OS X gcc 4.2, Intel Core i5 quadcore and Linux Intel C Compiler 12 on RedHat 2.6.32 Intel(R) Xeon(R). Both produce race conditions.
Full source with test cases:
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <stdint.h>
// #include "int_queue.h"
#include <stddef.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#ifndef INT_QUEUE_H
#define INT_QUEUE_H
#ifndef MAP_ANONYMOUS
#define MAP_ANONYMOUS MAP_ANON
#endif
struct int_queue_s {
size_t size;
size_t modulus;
volatile size_t read_offset;
volatile size_t write_offset;
volatile long int int_container[0];
};
static inline void int_queue_put(struct int_queue_s * const __restrict int_queue, const long int value ) {
if (value) {
int_queue->int_container[int_queue->write_offset & int_queue->modulus] = value;
int_queue->write_offset++;
}
}
static inline void int_queue_put_sync(struct int_queue_s * const __restrict int_queue, const long int value ) {
if (value) {
size_t write_offset;
for (;;) {
write_offset = int_queue->write_offset & int_queue->modulus;
if (!int_queue->int_container[write_offset])
if (__sync_bool_compare_and_swap(&(int_queue->int_container[write_offset]), (long int)0, value))
break;
}
int_queue->write_offset++;
}
}
static inline long int int_queue_get(struct int_queue_s * const __restrict int_queue) {
size_t read_offset = int_queue->read_offset & int_queue->modulus;
if (int_queue->write_offset != int_queue->read_offset) {
const long int value = int_queue->int_container[read_offset];
int_queue->int_container[read_offset] = 0;
int_queue->read_offset++;
return value;
} else
return 0;
}
static inline long int int_queue_get_sync(struct int_queue_s * const __restrict int_queue) {
size_t read_offset;
long int volatile value;
for (;;) {
read_offset = int_queue->read_offset;
if (int_queue->write_offset == read_offset)
return 0;
read_offset &= int_queue->modulus;
value = int_queue->int_container[read_offset];
if (value)
if (__sync_bool_compare_and_swap(&(int_queue->int_container[read_offset]), (long int)value, (long int)0))
break;
}
int_queue->read_offset++;
return value;
}
static inline struct int_queue_s * int_queue_create(size_t num_values) {
struct int_queue_s * int_queue;
size_t modulus;
size_t temp = num_values + 1;
do {
modulus = temp;
temp--;
temp &= modulus;
} while (temp);
modulus <<= 1;
size_t int_queue_mem = sizeof(*int_queue) + ( sizeof(int_queue->int_container[0]) * modulus);
if (int_queue_mem % sysconf(_SC_PAGE_SIZE)) int_queue_mem += sysconf(_SC_PAGE_SIZE) - (int_queue_mem % sysconf(_SC_PAGE_SIZE));
int_queue = mmap(NULL, int_queue_mem, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE , -1, 0);
if (int_queue == MAP_FAILED)
return NULL;
int_queue->modulus = modulus-1;
int_queue->read_offset = 0;
int_queue->write_offset = 0;
int_queue->size = num_values;
memset((void*)int_queue->int_container, 0, sizeof(int_queue->int_container[0]) * modulus);
size_t i;
for (i = 0; i < num_values; ) {
int_queue_put(int_queue, ++i );
}
return int_queue;
}
#endif
void * test_int_queue_thread(struct int_queue_s * int_queue) {
long int value;
size_t i;
for (i = 0; i < 10000000; i++) {
int waited = -1;
do {
value = int_queue_get_sync(int_queue);
waited++;
} while (!value);
if (waited > 0) {
printf("waited %d cycles to get a new value\n", waited);
// continue;
}
// else {
printf("thread %p got value %ld, i = %zu\n", (void *)pthread_self(), value, i);
// }
int timesleep = rand();
timesleep &= 0xFFF;
usleep(timesleep);
int_queue_put_sync(int_queue, value);
printf("thread %p put value %ld back, i = %zu\n", (void *)pthread_self(), value, i);
}
return NULL;
}
int main(int argc, char ** argv) {
struct int_queue_s * int_queue = int_queue_create(2);
if (!int_queue) {
fprintf(stderr, "error initializing int_queue\n");
return -1;
}
srand(0);
long int value[100];
size_t i;
for (i = 0; i < 100; i++) {
value[0] = int_queue_get(int_queue);
if (!value[0]) {
printf("error getting value\n");
}
else {
printf("got value %ld\n", value[0]);
}
int_queue_put(int_queue, value[0]);
printf("put value %ld back successfully\n", value[0]);
}
pthread_t threads[100];
for (i = 0; i < 4; i++) {
pthread_create(threads + i, NULL, (void * (*)(void *))test_int_queue_thread, int_queue);
}
for (i = 0; i < 4; i++) {
pthread_join(threads[i], NULL);
}
return 0;
}

Interesting question. Here is a wild guess. :-)
It seems you need some synchronization between your read_offset and write_offset.
For example, here is a race that may be related or not. Between your compare-and-swap and the write_offset increment you may have a reader come in and set the value back to zero.
Writer-1: get write_offset=0
Writer-2: get write_offset=0
Writer-1: compare-and-swap at offset=0
Writer-1: Set write_offset=1
Reader-1: compare-and-swap at offset=0 (sets it back to zero)
Writer-2: compare-and-swap at offset=0 again even though write_offset=1
Writer-2: Set write_offset=2

I believe that int_queue->write_offset++; is the problem: if two threads execute this instruction simultaneously, they will both load the same value from memory, increment it, and store the same result back (such that the variable only increases by one).

my opinion is
int_queue->write_offset++;
and
write_offset = int_queue->write_offset & int_queue->modulus;
are not thread safe

making your own malloc function in C

I need your help in this. I have an average knowledge of C and here is the problem. I am about to use some benchmarks to test some computer architecture stuff (branch misses, cache misses) on a new processor. The thing about it is that benchmarks are in C but I must not include any library calls. For example, I cannot use malloc because I am getting the error
"undefined reference to malloc"
even if I have included the library. So I have to write my own malloc. I do not want it to be super efficient - just do the basics. As I am thinking it I must have an address in memory and everytime a malloc happens, I return a pointer to that address and increment the counter by that size. Malloc happens twice in my program so I do not even need large memory.
Can you help me on that? I have designed a Verilog and do not have so much experience in C.
I have seen previous answers but all seem too complicated for me. Besides, I do not have access to K-R book.
Cheers!
EDIT: maybe this can help you more:
I am not using gcc but the sde-gcc compiler. Does it make any difference? Maybe that's why I am getting an undefined reference to malloc?
EDIT2:
I am testing a MIPS architecture:
I have included:
#include <stdlib.h>
and the errors are:
undefined reference to malloc
relocation truncated to fit: R_MIPS_26 against malloc
and the compiler command id:
test.o: test.c cap.h
sde-gcc -c -o test.s test.c -EB -march=mips64 -mabi=64 -G -O -ggdb -O2 -S
sde-as -o test.o test.s EB -march=mips64 -mabi=64 -G -O -ggdb
as_objects:=test.o init.o
EDIT 3:
ok, I used implementation above and it runs without any problems. The problem is that when doing embedded programming, you just have to define everything you are using so I defined my own malloc. sde-gcc didn't recognize the malloc function.

This is a very simple approach, which may get you past your 2 mallocs:
static unsigned char our_memory[1024 * 1024]; //reserve 1 MB for malloc
static size_t next_index = 0;
void *malloc(size_t sz)
{
void *mem;
if(sizeof our_memory - next_index < sz)
return NULL;
mem = &our_memory[next_index];
next_index += sz;
return mem;
}
void free(void *mem)
{
//we cheat, and don't free anything.
}
If required, you might need to align the memory piece you hand back, so e.g. you always
give back memory addresses that's on an address that's a multiple of 4, 8, 16 or whatever you require.

Trying a thread safe nos answer given above, I am referring his code with some changes as below:
static unsigned char our_memory[1024 * 1024]; //reserve 1 MB for malloc
static size_t next_index = 0;
static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
void *malloc(size_t sz)
{
void *mem;
pthread_mutex_lock(&lock);
if(sizeof our_memory - next_index < sz){
pthread_mutex_unlock(&lock);
return NULL;
}
mem = &our_memory[next_index];
next_index += sz;
pthread_mutex_unlock(&lock);
return mem;
}
void free(void *mem)
{
//we cheat, and don't free anything.
}

You need to link against libc.a or the equivilent for your system. If you don't use the standard C lib you won't get any of the startup code that runs before the main function either. Your program will never run....
You could either allocate a block of static data and use that in the place of malloc, like:
// char* fred = malloc(10000);
// equals
static char [100000] fred;
or call the standard malloc for a large block of continuous memory on startup and write yr own malloc type function to divide that down. In the 2nd case you would start benchmarking after the calling the system's malloc as to not effect the benchmarks.

I am sharing the complete approach for Malloc and free it works on every scenario. This is complimented using array. We can also implement using link list for metadata.
There are three Scenarios We have to Cover
Continuous Memory allocation: Allocate memory in continuous manner
Allocated memory between two allocated memory: When Memory is free to allocate in between two allocated memory block. we have to use that memory chunk for allocation.
Allocated from Initial block When Initial block is free.
for detailed You can see in diagram. Diagram for allocating algo of memory
Source code for malloc
#define TRUE 1
#define FALSE 0
#define MAX_ALOCATION_ALLOWED 20
static unsigned char our_memory[1024 * 1024];
static int g_allocted_number = 0;
static int g_heap_base_address = 0;
typedef struct malloc_info
{
int address;
int size;
}malloc_info_t;
malloc_info_t metadata_info[MAX_ALOCATION_ALLOWED] ={0};
void* my_malloc(int size)
{
int j =0;
int index = 0 ;
int initial_gap =0;
int gap =0;
int flag = FALSE;
int initial_flag = FALSE;
void *address = NULL;
int heap_index = 0;
malloc_info_t temp_info = {0};
if(g_allocted_number >= MAX_ALOCATION_ALLOWED)
{
return NULL;
}
for(index = 0; index < g_allocted_number; index++)
{
if(metadata_info[index+1].address != 0 )
{
initial_gap = metadata_info[0].address - g_heap_base_address; /*Checked Initial Block (Case 3)*/
if(initial_gap >= size)
{
initial_flag = TRUE;
break;
}
else
{
gap = metadata_info[index+1].address - (metadata_info[index].address + metadata_info[index].size); /*Check Gap Between two allocated memory (Case 2)*/
if(gap >= size)
{
flag = TRUE;
break;
}
}
}
}
if(flag == TRUE) /*Get Index for allocating memory for case 2*/
{
heap_index = ((metadata_info[index].address + metadata_info[index].size) - g_heap_base_address);
for(j = MAX_ALOCATION_ALLOWED -1; j > index+1; j--)
{
memcpy(&metadata_info[j], &metadata_info[j-1], sizeof(malloc_info_t));
}
}
else if (initial_flag == TRUE) /*Get Index for allocating memory for case 3*/
{
heap_index = 0;
for(j = MAX_ALOCATION_ALLOWED -1; j > index+1; j--)
{
memcpy(&metadata_info[j], &metadata_info[j-1], sizeof(malloc_info_t));
}
}
else /*Get Index for allocating memory for case 1*/
{
if(g_allocted_number != 0)
{
heap_index = ((metadata_info[index -1].address + metadata_info[index-1].size) - g_heap_base_address);
}
else /* 0 th Location of Metadata for First time allocation*/
heap_index = 0;
}
address = &our_memory[heap_index];
metadata_info[index].address = g_heap_base_address + heap_index;
metadata_info[index].size = size;
g_allocted_number += 1;
return address;
}
Now Code for Free
void my_free(int address)
{
int i =0;
int copy_meta_data = FALSE;
for(i = 0; i < g_allocted_number; i++)
{
if(address == metadata_info[i].address)
{
// memset(&our_memory[metadata_info[i].address], 0, metadata_info[i].size);
g_allocted_number -= 1;
copy_meta_data = TRUE;
printf("g_allocted_number in free = %d %d\n", g_allocted_number, address);
break;
}
}
if(copy_meta_data == TRUE)
{
if(i == MAX_ALOCATION_ALLOWED -1)
{
metadata_info[i].address = 0;
metadata_info[i].size = 0;
}
else
memcpy(&metadata_info[i], &metadata_info[i+1], sizeof(malloc_info_t));
}
}
For testing Now Test code is
int main()
{
int *ptr =NULL;
int *ptr1 =NULL;
int *ptr2 =NULL;
int *ptr3 =NULL;
int *ptr4 =NULL;
int *ptr5 =NULL;
int *ptr6 =NULL;
g_heap_base_address = &our_memory[0];
ptr = my_malloc(20);
ptr1 = my_malloc(20);
ptr2 = my_malloc(20);
my_free(ptr);
ptr3 = my_malloc(10);
ptr4 = my_malloc(20);
ptr5 = my_malloc(20);
ptr6 = my_malloc(10);
printf("Addresses are: %d, %d, %d, %d, %d, %d, %d\n", ptr, ptr1, ptr2, ptr3, ptr4, ptr5, ptr6);
return 0;
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight