I'm trying to create my own malloc() for practice. I got the code below from this thread.
typedef struct free_block {
size_t size;
struct free_block* next;
} free_block;
static free_block free_block_list_head = { 0, 0 };
// static const size_t overhead = sizeof(size_t);
static const size_t align_to = 16;
void* malloc(size_t size) {
size = (size + sizeof(free_block) + (align_to - 1)) & ~ (align_to - 1);
free_block* block = free_block_list_head.next;
free_block** head = &(free_block_list_head.next);
while (block != 0) {
if (block->size >= size) {
*head = block->next;
return ((char*)block) + sizeof(free_block);
}
head = &(block->next);
block = block->next;
}
block = (free_block*)sbrk(size);
block->size = size;
return ((char*)block) + sizeof(free_block);
}
void free(void* ptr) {
free_block* block = (free_block*)(((char*)ptr) - sizeof(free_block ));
block->next = free_block_list_head.next;
free_block_list_head.next = block;
}
I'm confused about treating memory chunks as a linked list. It seems to me that we basicaly call sbrk() everytime we need memory and we check if some of the memory that we requested before wasn't freed in the meantime.
But we have no way of checking other memory chunks that belong to other processes, we only check the memory that we requested before and added to our linked list.
If this is the case, is this optimal? Is this how the standard malloc() works?
Is there a way for us to work with all the memory on the heap ?
Please explain like I'm 5, I'm having a hard time understanding this concept.
Extending process data segment doesn't affect other processes. On most (recent) architectures process memory model is flat, i.e. each process has a virtual address space (2^32 or 2^64 bytes). When process requests extra memory (page), a virtual memory is added to process. In fact, that doesn't mean any physical memory allocation occurs, as virtual memory can be mapped to swap file, or unmapped before use altogether (address is given to process, but no actual resources assigned to it). Kernel takes care of mapping physical address to virtual one as per need/per resource availability.
What the algorithm does?
When user calls malloc the algorithm tries to find available empty block. In the beginning, there is none, so the algorithm tries to extend the process data segment.
However, you can see, that free doesn't release virtual memory (as it is not as trivial as allocating it), instead it adds this released block to a list of unused blocks.
So, when there are prereleased blocks, malloc attempts to reuse them instead of extendind the data segment.
Do standard mallocs work as above: no. The example you've provided is simple, yet really inefficient. There are many different algorithms available for memory management: small block heaps (when allocating data up to certain amount has a O(1) performance), thread-specific allocators (reducing congestion of accessing heap from multiple threads), allocators, that pre-allocate large chunks and then use them (similar to above, yet more efficient) and other.
You can try to goodle for "memory heap implementation" for more info
Related
I was trying to see the working of oom_kill by invoking manually.
I allocated memory dynamically and tried to use them infinitely with while loop at first and then with the for loop to test out of memory.
But in the first case where I used the while loop it threw segmentation fault without swap and became unresponsive with swap whereas with the for loop out of memory (oom_kill) was invoked.
Sample codes of both:
First case: while:
int main (void) {
char* p;
while (1) {
p=malloc(1<<20);
memset (p, 0, (1<<20));
}
}
Second case : for :
int main (void) {
int i, n = 0;
char *pp[N];
for (n = 0; n < N; n++) {
pp[n] = malloc(1<<20);
if (pp[n] == NULL)
break;
}
printf("malloc failure after %d MiB\n", n);
for (i = 0; i < n; i++) {
memset (pp[i], 0, (1<<20));
printf("%d\n", i+1);
}
where N is some very large number to invoke oom. Referred this https://www.win.tue.nl/~aeb/linux/lk/lk-9.html for 2nd case.
Why does it happen so? What is the mistake I'm making with the while loop?
Kernel version : 4.15
Why does it happen so?
To invoke the OOM killer, you need to have a situation where an access to memory cannot be fulfilled because there is not enough RAM available to fulfill the access. To do that, you want to first have large allocations (virtual memory mappings), then write to them.
The procedure to trigger the OOM killer is very simple:
Allocate lots of memory
Write to the allocated memory
You must have enough preallocated memory to cause everything evictable from RAM to be evicted (things like memory-mapped files), and all of swap to be used, before the kernel will evoke the OOM killer to provide more RAM/swap space to fullfill the backing to the virtual memory being written to.
What is the mistake I'm making with the while loop?
One bug, and one logical error.
The bug is, you do not check if malloc() returns NULL. malloc() returns NULL, when there is no more virtual memory available (or kernel refuses to provide more, for any reason) for the process. (In normal operation, the virtual memory available to each process is limited for non-privileged users; run e.g. ulimit -a to see the current limits.)
Because you access the memory immediately when allocated, the kernel simply refuses to allow your process more when it runs out of RAM and SWAP, and malloc() returns NULL. You then dereference the NULL pointer (by using memset(NULL, 0, 1<<20)), which causes the Segmentation fault.
The logical problem is that that scheme will not trigger the kernel OOM killer.
Remember, in order to trigger the kernel OOM killer, your process must have allocated memory that it has not accessed yet. The kernel evokes the OOM killer only when it has already provided the virtual memory, but cannot back it with actual RAM, because there is nothing evictable in RAM, and swap is already full.
In your case, the OOM killer will not get evoked, because when the kernel runs out of RAM and swap, it can simply refuse to provide more (virtual memory), leading to malloc() returning NULL.
(The Linux kernel memory subsystem is one that is actively developed, so the exact behaviour you see depends on both the kernel version, the amount of RAM and swap, and the memory manager tunables (e.g., those under /proc/sys/vm/). The above describes the most common, or typical cases and configurations.)
You don't need an external array, either. You can for example chain the allocations to a linked list:
#include <stdlib.h>
#include <stdio.h>
#ifndef SIZE
#define SIZE (2*1024*1024) /* 2 MiB */
#endif
struct list {
struct list *next;
size_t size;
char data[];
}
struct list *allocate_node(const size_t size)
{
struct list *new_node;
new_node = malloc(sizeof (struct list) + size);
if (!new_node)
return NULL;
new_node->next = NULL;
new_node->size = size;
}
int main(void)
{
size_t used = 0;
struct list *root = NULL, *curr;
/* Allocate as much memory as possible. */
while (1) {
curr = allocate_node(SIZE - sizeof (struct list));
if (!curr)
break;
/* Account for allocated total size */
used += SIZE;
/* Prepend to root list */
curr->next = root;
root = curr;
}
printf("Allocated %zu bytes.\n", used);
fflush(stdout);
/* Use all of the allocated memory. */
for (curr = root; curr != NULL; curr = curr->next)
if (curr->size > 0)
memset(curr->data, ~(unsigned char)0, curr->size);
printf("Wrote to %zu bytes of allocated memory. Done.\n", used);
fflush(stdout);
return EXIT_SUCCESS;
}
Note, the above code is untested, and even uncompiled, but the logic is sound. If you find a bug in it, or have some other issue with it, let me know in a comment so I can verify and fix.
The document you're reading is from 2003. The impossibly large number it chose to allocate was 10,000 MiB.
Today, in 2018, when new computers are likely to come with 16GiB of RAM, this kind of allocation could definitely succeed without issues.
What is the mistake I'm making with the while loop?
The segmentation fault is likely the result of passing a null pointer to memset(), since malloc() will return NULL on error.
Your second example avoids this error by always checking the return value from malloc().
I used the while loop it ... became unresponsive with swap ...
From the very document that you mentioned that you are reading:
Sometimes processes get a segfault when accessing memory that the kernel is unable to provide, sometimes they are killed, sometimes other processes are killed, sometimes the kernel hangs.
Other than mentioning the kernel version, you are very vague with the OS and system description. Presumably this is a 32-bit version?
There are actually two ways of running out of memory. Your program could exceed the amount of (virtual) memory that is allocated, or the system could actually run out of memory pages.
Note that availability of memory (pages) is a complex combination of physical memory size, swap space size, memory usage and process load.
Reference: When Linux Runs Out of Memory
by Mulyadi Santosa or here.
To obtain the length of a null terminated string,we simply write len = strlen(str) however,i often see here on SO posts saying that to get the size of an int array for example,you need to keep track of it on your own and that's what i do normally.But,i have a question,could we obtain the size by using some sort of write permission check,that checks if we have writing permissions to a block of memory? for example :
#include <stdio.h>
int getSize(int *arr);
bool permissionTo(int *ptr);
int main(void)
{
int arr[3] = {1,2,3};
int size = getSize(arr) * sizeof(int);
}
int getSize(int *arr)
{
int *ptr = arr;
int size = 0;
while( permissionTo(ptr) )
{
size++;
ptr++;
}
return size;
}
bool permissionTo(int *ptr)
{
/*............*/
}
No, you can't. Memory permissions don't have this granularity on most, if not all, architectures.
Almost all CPU architectures manage memory in pages. On most things you'll run into today one page is 4kB. There's no practical way to control permissions on anything smaller than that.
Most memory management is done by your libc allocating a large:ish chunk of memory from the kernel and then handing out smaller chunks of it to individual malloc calls. This is done for performance (among other things) because creating, removing or modifying a memory mapping is an expensive operation especially on multiprocessor systems.
For the stack (as in your example), allocations are even simpler. The kernel knows that "this large area of memory will be used by the stack" and memory accesses to it just simply allocates the necessary pages to back it. All tracking your program does of stack allocations is one register.
If you are trying to achive, that an allocation becomes comfortable to use by carrying its own size around then do this:
Wrap malloc and free by prefixing the memory with its size internally (written from memory, not tested yet):
void* myMalloc(long numBytes) {
char* mem = malloc(numBytes+sizeof(long));
((long*)mem)[0] = numBytes;
return mem+sizeof(long);
}
void myFree(void* memory) {
char* mem = (char*)memory-sizeof(long);
free(mem)
}
long memlen(void* memory) {
char* mem = (char*)memory-sizeof(long);
return ((long*)mem)[0];
}
I want to create a program that will simulate an out-of-memory (OOM) situation on a Unix server. I created this super-simple memory eater:
#include <stdio.h>
#include <stdlib.h>
unsigned long long memory_to_eat = 1024 * 50000;
size_t eaten_memory = 0;
void *memory = NULL;
int eat_kilobyte()
{
memory = realloc(memory, (eaten_memory * 1024) + 1024);
if (memory == NULL)
{
// realloc failed here - we probably can't allocate more memory for whatever reason
return 1;
}
else
{
eaten_memory++;
return 0;
}
}
int main(int argc, char **argv)
{
printf("I will try to eat %i kb of ram\n", memory_to_eat);
int megabyte = 0;
while (memory_to_eat > 0)
{
memory_to_eat--;
if (eat_kilobyte())
{
printf("Failed to allocate more memory! Stucked at %i kb :(\n", eaten_memory);
return 200;
}
if (megabyte++ >= 1024)
{
printf("Eaten 1 MB of ram\n");
megabyte = 0;
}
}
printf("Successfully eaten requested memory!\n");
free(memory);
return 0;
}
It eats as much memory as defined in memory_to_eat which now is exactly 50 GB of RAM. It allocates memory by 1 MB and prints exactly the point where it fails to allocate more, so that I know which maximum value it managed to eat.
The problem is that it works. Even on a system with 1 GB of physical memory.
When I check top I see that the process eats 50 GB of virtual memory and only less than 1 MB of resident memory. Is there a way to create a memory eater that really does consume it?
System specifications: Linux kernel 3.16 (Debian) most likely with overcommit enabled (not sure how to check it out) with no swap and virtualized.
When your malloc() implementation requests memory from the system kernel (via an sbrk() or mmap() system call), the kernel only makes a note that you have requested the memory and where it is to be placed within your address space. It does not actually map those pages yet.
When the process subsequently accesses memory within the new region, the hardware recognizes a segmentation fault and alerts the kernel to the condition. The kernel then looks up the page in its own data structures, and finds that you should have a zero page there, so it maps in a zero page (possibly first evicting a page from page-cache) and returns from the interrupt. Your process does not realize that any of this happened, the kernels operation is perfectly transparent (except for the short delay while the kernel does its work).
This optimization allows the system call to return very quickly, and, most importantly, it avoids any resources to be committed to your process when the mapping is made. This allows processes to reserve rather large buffers that they never need under normal circumstances, without fear of gobbling up too much memory.
So, if you want to program a memory eater, you absolutely have to actually do something with the memory you allocate. For this, you only need to add a single line to your code:
int eat_kilobyte()
{
if (memory == NULL)
memory = malloc(1024);
else
memory = realloc(memory, (eaten_memory * 1024) + 1024);
if (memory == NULL)
{
return 1;
}
else
{
//Force the kernel to map the containing memory page.
((char*)memory)[1024*eaten_memory] = 42;
eaten_memory++;
return 0;
}
}
Note that it is perfectly sufficient to write to a single byte within each page (which contains 4096 bytes on X86). That's because all memory allocation from the kernel to a process is done at memory page granularity, which is, in turn, because of the hardware that does not allow paging at smaller granularities.
All the virtual pages start out copy-on-write mapped to the same zeroed physical page. To use up physical pages, you can dirty them by writing something to each virtual page.
If running as root, you can use mlock(2) or mlockall(2) to have the kernel wire up the pages when they're allocated, without having to dirty them. (normal non-root users have a ulimit -l of only 64kiB.)
As many others suggested, it seems that the Linux kernel doesn't really allocate the memory unless you write to it
An improved version of the code, which does what the OP was wanting:
This also fixes the printf format string mismatches with the types of memory_to_eat and eaten_memory, using %zi to print size_t integers. The memory size to eat, in kiB, can optionally be specified as a command line arg.
The messy design using global variables, and growing by 1k instead of 4k pages, is unchanged.
#include <stdio.h>
#include <stdlib.h>
size_t memory_to_eat = 1024 * 50000;
size_t eaten_memory = 0;
char *memory = NULL;
void write_kilobyte(char *pointer, size_t offset)
{
int size = 0;
while (size < 1024)
{ // writing one byte per page is enough, this is overkill
pointer[offset + (size_t) size++] = 1;
}
}
int eat_kilobyte()
{
if (memory == NULL)
{
memory = malloc(1024);
} else
{
memory = realloc(memory, (eaten_memory * 1024) + 1024);
}
if (memory == NULL)
{
return 1;
}
else
{
write_kilobyte(memory, eaten_memory * 1024);
eaten_memory++;
return 0;
}
}
int main(int argc, char **argv)
{
if (argc >= 2)
memory_to_eat = atoll(argv[1]);
printf("I will try to eat %zi kb of ram\n", memory_to_eat);
int megabyte = 0;
int megabytes = 0;
while (memory_to_eat-- > 0)
{
if (eat_kilobyte())
{
printf("Failed to allocate more memory at %zi kb :(\n", eaten_memory);
return 200;
}
if (megabyte++ >= 1024)
{
megabytes++;
printf("Eaten %i MB of ram\n", megabytes);
megabyte = 0;
}
}
printf("Successfully eaten requested memory!\n");
free(memory);
return 0;
}
A sensible optimisation is being made here. The runtime does not actually acquire the memory until you use it.
A simple memcpy will be sufficient to circumvent this optimisation. (You might find that calloc still optimises out the memory allocation until the point of use.)
Not sure about this one but the only explanation that I can things of is that linux is a copy-on-write operating system. When one calls fork the both processes point to the same physically memory. The memory is only copied once one process actually WRITES to the memory.
I think here, the actual physical memory is only allocated when one tries to write something to it. Calling sbrk or mmap may well only update the kernel's memory book-keep. The actual RAM may only be allocated when we actually try to access the memory.
Basic Answer
As mentioned by others, the allocation of memory, until used, does not always commit the necessary RAM. This happens if you allocate a buffer larger than one page (usually 4Kb on Linux).
One simple answer would be for your "eat memory" function to always allocate 1Kb instead of increasingly larger blocks. This is because each allocated blocks start with a header (a size for allocated blocks). So allocating a buffer of a size equal to or less than one page will always commit all of those pages.
Following Your Idea
To optimize your code as much as possible, you want to allocate blocks of memory aligned to 1 page size.
From what I can see in your code, you use 1024. I would suggest that you use:
int size;
size = getpagesize();
block_size = size - sizeof(void *) * 2;
What voodoo magic is this sizeof(void *) * 2?! When using the default memory allocation library (i.e. not SAN, fence, valgrin, ...), there is a small header just before the pointer returned by malloc() which includes a pointer to the next block and a size.
struct mem_header { void * next_block; intptr_t size; };
Now, using block_size, all your malloc() should be aligned to the page size we found earlier.
If you want to properly align everything, the first allocation needs to use an aligned allocation:
char *p = NULL;
int posix_memalign(&p, size, block_size);
Further allocations (assuming your tool only does that) can use malloc(). They will be aligned.
p = malloc(block_size);
Note: please verify that it is indeed aligned on your system... it works on mine.
As a result you can simplify your loop with:
for(;;)
{
p = malloc(block_size);
*p = 1;
}
Until you create a thread, the malloc() does not use mutexes. But it still has to look for a free memory block. In your case, though, it will be one after the other and there will be no holes in the allocated memory so it will be pretty fast.
Can it be faster?
Further note about how memory is generally allocated in a Unix system:
the malloc() function and related functions will allocate a block in your heap; which at the start is pretty small (maybe 2Mb)
when the existing heap is full it gets grown using the sbrk() function; as far as your process is concerned, the memory address always increases, that's what sbrk() does (contrary to MS-Windows which allocates blocks all over the place)
using sbrk() once and then hitting the memory every "page size" bytes would be faster than using malloc()
char * p = malloc(size); // get current "highest address"
p += size;
p = (char*)((intptr_t)p & -size); // clear bits (alignment)
int total_mem(50 * 1024 * 1024 * 1024); // 50Gb
void * start(sbrk(total_mem));
char * end((char *)start + total_mem);
for(; p < end; p += size)
{
*p = 1;
}
note that the malloc() above may give you the "wrong" start address. But your process really doesn't do much, so I think you'll always be safe. That for() loop, however, is going to be as fast as possible. As mentioned by others, you'll get the total_mem of virtual memory allocated "instantly" and then the RSS memory allocated each time you write to *p.
WARNING: Code not tested, use at your own risk.
I'm allocating some space with malloc when my app starts. If I don't populate this variable top shows 0% of my memory used by this app, but if I start to populate this variable top begins to show increase usage of ram by the way I'm populating this array.
So my question is: shouldn't top show this space allocated by malloc as an used space of my app? Why it only show increase of RAM usage from my app when I populate this variable?
I'm at Ubuntu 10.10 64bits. Here is the code that populates it:
char pack(uint64_t list, char bits, uint64_t *list_compressed, char control, uint64_t *index){
uint64_t a, rest;
if(control == 0){
a = list;
}
else{
rest = list >> (64 - control);
a = (control == 64 ? list_compressed[*index] : list_compressed[*index] + (list << control));
if(control + bits >= 64){
control = control - 64;
//list_compressed[*index] = a;
(*index)++;
a = rest;
}
}
//list_compressed[*index] = a;
control = control + bits;
return control;
}
The "malloqued" variable is list_compressed.
If I uncomment the list_compressed population the ram usage is increased, if I keep it commented the usage is 0%.
Short answer, no. On many OSs, when you call malloc, it doesn't directly allocate you the memory, but only when you access it.
From malloc man page:
By default, Linux follows an optimistic memory allocation strategy.
This means that when malloc() returns non-NULL there is no guarantee
that the memory really is available.
Modern operating systems may just return a virtual memory address when you allocate, which doesn't actually point to the chunk of memory. It is only 'allocated' when you want to use it.
I'm redefining memory functions in C and I wonder if this idea could work as implementation for the free() function:
typedef struct _mem_dictionary
{
void *addr;
size_t size;
} mem_dictionary;
mem_dictionary *dictionary = NULL; //array of memory dictionaries
int dictionary_ct = 0; //dictionary struct counter
void *malloc(size_t size)
{
void *return_ptr = (void *) sbrk(size);
if (dictionary == NULL)
dictionary = (void *) sbrk(1024 * sizeof(mem_dictionary));
dictionary[dictionary_ct].addr = return_ptr;
dictionary[dictionary_ct].size = size;
dictionary_ct++;
printf("malloc(): %p assigned memory\n",return_ptr);
return return_ptr;
}
void free(void *ptr)
{
size_t i;
int flag = 0;
for(i = 0; i < dictionary_ct ; i++){
if(dictionary[i].addr == ptr){
dictionary[i].addr=NULL;
dictionary[i].size = 0;
flag = 1;
break;
}
}
if(!flag){
printf("Remember to free!\n");
}
}
Thanks in advance!
No, it will not. The address you are "freeing" is effectively lost after such a call. How would you ever know that the particular chunk of memory is again available for allocation?
There has been a lot of research in this area, here is some overview - Fast Memory Allocation in Dr. Dobbs.
Edit 0:
You are wrong about sbrk(2) - it's not a "better malloc" and you cannot use it as such. That system call modifies end of process data segment.
Few things:
Where do you allocate the memory for the dictionary?
How do you allocate the memory that dictionary->addr is pointing at? Without having the code for your malloc it is not visible if your free would work.
Unless in your malloc function you're going through each and every memory address available to the process to check if it is not used by your dictionary, merely the assignment dictionary[i].addr=NULL would not "free" the memory, and definitely not keep it for reuse.
BTW, the printf function in your version of free would print Remember to free! when the user calls free on a pointer that is supposedly not allocated, right? Then why "remember to free"?
Edit:
So with that malloc function, no, your free does not free the memory. First of all, you're losing the address of the memory, so every time you call this malloc you're actually pushing the process break a little further, and never reuse freed memory locations. One way to solve this is to somehow keep track of locations that you have "freed" so that next time that malloc is called, you can check if you have enough available memory already allocated to the process, and then reuse those locations. Also, remember that sbrk is a wrapper around brk which is an expensive system call, you should optimize your malloc so that a big chunk of memory is requested from OS using sbrk and then just keep track of which part you're using, and which part is available.