In my class we have an assignment and one of the questions states:
Memory fragmentation in C: Design, implement, and run a C-program that does the following: it allocated memory for a sequence of of 3m arrays of size 500000 elements each; then it deallocates all even-numbered arrays and allocates a sequence of m arrays of size 700000 elements each. Measure the amounts of time your program requires for the allocations of the first sequence and for the second sequence. Choose m so that you exhaust all of the main memory available to your program. Explain your timings
My implementation of this is as follows:
#include <iostream>
#include <time.h>
#include <algorithm>
void main(){
clock_t begin1, stop1, begin2, stop2;
double tdif = 0, tdif2 = 0;
for(int k=0;k<1000;k++){
double dif, dif2;
const int m = 50000;
begin1 = clock();
printf("Step One\n");
int *container[3*m];
for(int i=0;i<(3*m);i++)
{
int *tmpAry = (int *)malloc(500000*sizeof(int));
container[i] = tmpAry;
}
stop1 = clock();
printf("Step Two\n");
for(int i=0;i<(3*m);i+=2)
{
free(container[i]);
}
begin2 = clock();
printf("Step Three\n");
int *container2[m];
for(int i=0;i<m;i++)
{
int *tmpAry = (int *)malloc(700000*sizeof(int));
container2[i] = tmpAry;
}
stop2 = clock();
dif = (stop1 - begin1)/1000.00;
dif2 = (stop2 - begin2)/1000.00;
tdif+=dif;
tdif/=2;
tdif2+=dif2;
tdif2/=2;
}
printf("To Allocate the first array it took: %.5f\n",tdif);
printf("To Allocate the second array it took: %.5f\n",tdif2);
system("pause");
};
I have changed this up a few different ways, but the consistencies I see are that when I initially allocate the memory for 3*m*500000 element arrays it uses up all of the available main memory. But then when I tell it to free them the memory is not released back to the OS so then when it goes to allocate the m*700000 element arrays it does it in the page file (swap memory) so it does not actually display memory fragmentation.
The above code runs this 1000 times and averages it, it takes quite some time. The first sequence average took 2.06913 seconds and the second sequence took 0.67594 seconds. To me the second sequence is supposed to take longer to show how fragmentation works, but because of the swap being used this does not occur. Is there a way around this or am I wrong in my assumption?
I will ask the professor about what I have on monday but until then any help would be appreciated.
Many libc implementations (I think glibc included) don't release memory back to the OS when you call free(), but keep it so you can use it on the next allocation without a syscall. Also, because of the complexity of modern paging and virtual memory stratagies, you can never be sure where anything is in physical memory, which makes it almost imposible to intentionally fragment it (even if it comes fragmented). You have to remember, all virtual memory, and all physical memory are different beasts.
(The following is written for Linux, but probably applicable to Windows and OSX)
When your program makes the first allocations, let's say there is enough physical memory for the OS to squeeze all of the pages in. They aren't all next to each-other in physical memory -- they are scattered wherever they can be. Then the OS modifies the page table to make a set of continuous virtual addresses, that refer to the scattered pages around in memory. But here's the thing -- because you don't really use the first memory you allocate, it becomes a really good candidate for swapping out. So, when you come along to do the next allocations, the OS, running out of memory, will probably swap out some of those pages to make room for the new ones. Because of this, you are actually measuring disk speeds, and the efficiency of the operations systems paging mechanism -- not fragmentation.
Remember, an set of continuous virtual addresses is almost never physically continuous in practice (or even in memory).
Related
In C program I face transactions that require to have alot of memory chunks, I need to know if there is an algorithm or best practice teqnique used to handle all these malloc/free, I've used arrays to store these memory chunks but at some point the array itself gets full and reallocating the array is just more waste, what is the elegant way to handle this issue?
The best algorithm in this case would be free list allocator + binary search tree.
You are asking one big memory chunk from system, and then allocating memory blocks of fixed size from this chunk. When chunk is full you are allocating another one, and link them into red-black or AVL binary search interval tree (otherwise searching for a chunk during free by iterating over the chunks list becomes a bottleneck)
And in the same time, multi-threading and thread synchronization becomes important. If you will simply use mutexes or trivial spin-locks, you will found you libc malloc works much faster than your custom memory allocator. Solution can be found in Hazard pointer, i.e. each thread have it's own memory allocator (or one allocator per CPU core). This will add another problem - when one thread allocates memory and another releasing it, this will require searching for the exact allocator during free function, and strick locking or lock-free data structures.
The best option - you can use jemalloc, tcmalloc or any another generic propose fast memory allocator to replace your libc (i.e. pdmalloc) default allocator completely or partially.
If you need to track these various buffers due to the functionality you are delivering then you are going to need to have some kind of buffer management functionality which includes allocating memory for the buffer management.
However if you are worried about memory fragmentation, that is a different question.
I have seen a lot written about how using malloc() and free() leads to memory fragmentation along with various ways to reduce or eliminate this problem.
I suspect years ago, it may have been a problem. However these days I question whether most programmers can do as good a job at managing memory as the people who develop the run time of modern compilers. I am sure there are special applications but compiler runtime development is very much aware of memory fragmentation and there are a number of approaches they use to help mitigate the problem.
For instance here is an older article from 2010 describing a modern compiler runtime implements malloc() and free(). A look at how malloc works on the Mac
Here is a bit about the GNU allocator.
This article from Nov of 2004 from IBM describes some of the considerations for memory management, Inside memory management and provides what they call "code for a simplistic implementation of malloc and free to help demonstrate what is involved with managing memory." Notice that this is a simplistic example meant to illustrate some of the issues and is not a demonstration of current practice.
I did a quick console application with Visual Studio 2015 that called a C function in a C source file that interspersed malloc() and free() calls of various sizes. I ran this while watching the process in Windows Task Manager. Peak Working Set (Memory) maxed out at 34MB. Watching the Memory (Private Working Set) measurement, I saw it rise and fall as the program ran.
#include <malloc.h>
#include <stdio.h>
void funcAlloc(void)
{
char *p[50000] = { 0 };
int i;
for (i = 0; i < 50000; i+= 2) {
p[i] = malloc(32 + (i % 1000));
}
for (i = 0; i < 50000; i += 2) {
free(p[i]); p[i] = 0;
}
for (i = 1; i < 50000; i += 2) {
p[i] = malloc(32 + (i % 1000));
}
for (i = 1; i < 50000; i += 2) {
free(p[i]); p[i] = 0;
}
for (i = 0; i < 50000; i++) {
p[i] = malloc(32 + (i % 1000));
}
for (i = 0; i < 50000; i += 3) {
free(p[i]); p[i] = 0;
}
for (i = 1; i < 50000; i += 3) {
free(p[i]); p[i] = 0;
}
for (i = 2; i < 50000; i += 3) {
free(p[i]); p[i] = 0;
}
}
void funcMain(void)
{
for (int i = 0; i < 5000; i++) {
funcAlloc();
}
}
Probably the only consideration that a programmer should practice to help the memory allocator under some conditions of churning memory with malloc() and free() is to use a set of standard buffer sizes for varying lengths of data.
For example, if you are creating temporary buffers for several different data structs which are of varying sizes, use a standard buffer size of the largest struct for all of the buffers so that the memory manager is working with equal sized chunks of memory so it is able to more efficiently reuse chunks of free memory. I have seen some dynamic data structure functionality use this approach such as allocating a dynamic string with a minimum length of 32 characters or rounding up a buffer request to a multiple of four or eight.
I would like to know if I can choose the storage location of arrays in c. There are a couple of questions already on here with some helpful info, but I'm looking for some extra info.
I have an embedded system with a soft-core ARM cortex implemented on an FPGA.
Upon start-up code is loaded from memory and executed by the processor. My code is in assembley and contains some c functions. One particular function is a uART interrupt which I have included below
void UART_ISR()
{
int count, n=1000, t1=0, t2=1, display=0, y, z;
int x[1000]; //storage array for first 1000 terms of Fibonacci series
x[1] = t1;
x[2] = t2;
printf("\n\nFibonacci Series: \n\n %d \n %d \n ", t1, t2);
count=2; /* count=2 because first two terms are already displayed. */
while (count<n)
{
display=t1+t2;
t1=t2;
t2=display;
x[count] = t2;
++count;
printf(" %d \n",display);
}
printf("\n\n Finished. Sequence written to memory. Reading sequence from memory.....:\n\n");
for (z=0; z<10000; z++){} // Delay
for (y=0; y<1000; y++) { //Read variables from memory
printf("%d \n",x[y]);
}
}
So basically the first 1000 values of the Fibonacci series are printed and stored in array X and then values from the array are printed to the screen again after a short delay.
Please correct me if I'm wrong but the values in the array X are stored on the stack as they are computed in the for loop and retrieved from the stack when the array is read from memory.
Here is he memory map of the system
0x0000_0000 to 0x0000_0be0 is the code
0x0000_0be0 to 0x0010_0be0 is 1MB heap
0x0010_0be0 to 0x0014_0be0 is 256KB stack
0x0014_0be0 to 0x03F_FFFF is of-chip RAM
Is there a function in c that allows me to store the array X in the off-chip ram for later retrieval?
Please let me know if you need any more info
Thanks very much for helping
--W
No, not "in C" as in "specified by the language".
The C language doesn't care about where things are stored, it specifies nothing about the existance of RAM at particular addresses.
But, actual implementations in the form of compilers, assemblers and linkers, often care a great deal about this.
With gcc for instance, you can use the section variable attribute to force a variable into a particular section.
You can then control the linker to map that section to a particular memory area.
UPDATE:
The other way to do this is manually, by not letting the compiler in on the secret and doing it yourself.
Something like:
int *external_array = (int *) 0x00140be0;
memcpy(external_array, x, sizeof x);
will copy the required number of bytes to the external memory. You can then read it back by swapping the two first arguments in the memcpy() call.
Note that this is way more manual, low-level and fragile, compared to letting the compiler/linker dynamic duo Just Make it Work for you.
Also, it seems very unlikely that you want to do all of that work from an ISR.
with the following code, I am trying to make an array of numbers and then sorting them. But if I set a high arraysize (MAX), the program stops at the last 'randomly' generated number and does not continue to the sorting at all. Could anyone please give me a hand with this?
#include <stdio.h>
#define MAX 2000000
int a[MAX];
int rand_seed=10;
/* from K&R
- returns random number between 0 and 62000.*/
int rand();
int bubble_sort();
int main()
{
int i;
/* fill array */
for (i=0; i < MAX; i++)
{
a[i]=rand();
printf(">%d= %d\n", i, a[i]);
}
bubble_sort();
/* print sorted array */
printf("--------------------\n");
for (i=0; i < MAX; i++)
printf("%d\n",a[i]);
return 0;
}
int rand()
{
rand_seed = rand_seed * 1103515245 +12345;
return (unsigned int)(rand_seed / 65536) % 62000;
}
int bubble_sort(void)
{
int t, x, y;
/* bubble sort the array */
for (x=0; x < MAX-1; x++)
for (y=0; y < MAX-x-1; y++)
if (a[y] > a[y+1])
{
t=a[y];
a[y]=a[y+1];
a[y+1]=t;
}
return 0;
}
The problem is that you are storing the array in global section, C doesn't give any guarantee about the maximum size of global section it can support, this is a function of OS, arch compiler.
So instead of creating a global array, create a global C pointer, allocated a large chunk using malloc. Now memory is saved in the heap which is much bigger and can grow at runtime.
Your array will land in BSS section for static vars. It will not be part of an image but program loader will allocate required space and fill it with zeros before your program starts 'real' execution. You can even control this process if using embedded compiler and fill your static data with anything you like. This array may occupy 2GB or your RAM and yet your exe file may be few kilobytes. I've just managed to use over 2GB array this way and my exe was 34KB. I can believe a compiler may warn you when you approach maybe 231-1 elements (if your int is 32bit) but static arrays with 2m elements are not a problem nowadays (unless it is embedded system but I bet it is not).
The problem might be that your bubble sort has 2 nested loops (as all bubble sorts) so trying to sort this array - having 2m elements - causes the program to loop 2*1012 times (arithmetic sequence):
inner loop:
1: 1999999 times
2: 1999998 times
...
2000000: 1 time
So you must swap elements
2000000 * (1999999+1) / 2 = (4 / 2) * 10000002 = 2*1012 times
(correct me if I am wrong above)
Your program simply remains too long in sort routine and you are not even aware of that. What you see it just last rand number printed and program not responding. Even on my really fast PC with 200K array it took around 1minute to sort it this way.
It is not related to your os, compiler, heaps etc. Your program is just stuck as your loop executes 2*1012 times if you have 2m elements.
To verify my words print "sort started" before sorting and "sort finished" after that. I bet the last thing you'll see is "sort started". In addition you may print current x value before your inner loop in bubble_sort - you'll see that it is working.
Dynamic Array
int *Array;
Array= malloc (sizeof(int) * Size);
The original C standard (ANSI 1989/ISO 1990) required that a compiler successfully translate at least one program containing at least one example of a set of environmental limits. One of those limits was being able to create an object of at least 32,767 bytes.
This minimum limit was raised in the 1999 update to the C standard to be at least 65,535 bytes.
No C implementation is required to provide for objects greater than that size, which means that they don't need to allow for an array of ints greater than
(int)(65535 / sizeof(int)).
In very practical terms, on modern computers, it is not possible to say in advance how large an array can be created. It can depend on things like the amount of physical memory installed in the computer, the amount of virtual memory provided by the OS, the number of other tasks, drivers, and programs already running and how much memory that are using. So your program may be able to use more or less memory running today than it could use yesterday or it will be able to use tomorrow.
Many platforms place their strictest limits on automatic objects, that is those defined inside of a function without the use of the 'static' keyword. On some platforms you can create larger arrays if they are static or by dynamic allocation.
I have written a memory allocator that is (supposedly) faster than using malloc/free.
I have written a small amout of code to test this but I'm not sure if this is the correct way to profile a memory allocator, can anyone give me some advice?
The output of this code is:
Mem_Alloc: 0.020000s
malloc: 3.869000s
difference: 3.849000s
Mem_Alloc is 193.449997 times faster.
This is the code:
int i;
int mem_alloc_time, malloc_time;
float mem_alloc_time_float, malloc_time_float, times_faster;
unsigned prev;
// Test Mem_Alloc
timeBeginPeriod (1);
mem_alloc_time = timeGetTime ();
for (i = 0; i < 100000; i++) {
void *p = Mem_Alloc (100000);
Mem_Free (p);
}
// Get the duration
mem_alloc_time = timeGetTime () - mem_alloc_time;
// Test malloc
prev = mem_alloc_time; // For getting the difference between the two times
malloc_time = timeGetTime ();
for (i = 0; i < 100000; i++) {
void *p = malloc (100000);
free (p);
}
// Get the duration
malloc_time = timeGetTime() - malloc_time;
timeEndPeriod (1);
// Convert both times to seconds
mem_alloc_time_float = (float)mem_alloc_time / 1000.0f;
malloc_time_float = (float)malloc_time / 1000.0f;
// Print the results
printf ("Mem_Alloc: %fs\n", mem_alloc_time_float);
printf ("malloc: %fs\n", malloc_time_float);
if (mem_alloc_time_float > malloc_time_float) {
printf ("difference: %fs\n", mem_alloc_time_float - malloc_time_float);
} else {
printf ("difference: %fs\n", malloc_time_float - mem_alloc_time_float);
}
times_faster = (float)max(mem_alloc_time_float, malloc_time_float) /
(float)min(mem_alloc_time_float, malloc_time_float);
printf ("Mem_Alloc is %f times faster.\n", times_faster);
Nobody cares[*] whether your allocator is faster or slower than their allocator, at allocating and then immediately freeing a 100k block 100k times. That is not a common memory allocation pattern (and for any situation where it occurs, there are probably better ways to optimize than using your memory allocator. For example, use the stack via alloca or use a static array).
People care greatly whether or not your allocator will speed up their application.
Choose a real application. Study its performance at allocation-heavy tasks with the two different allocators, and compare that. Then study more allocation-heavy tasks.
Just for one example, you might compare the time to start up Firefox and load the StackOverflow front page. You could mock the network (or at least use a local HTTP proxy), to remove a lot of the random variation from the test. You could also use a profiler to see how much time is spent in malloc and hence whether the task is allocation-heavy or not, but beware that stuff like "overcommit" might mean that not all of the cost of memory allocation is paid in malloc.
If you wrote the allocator in order to speed up your own application, you should use your own application.
One thing to watch out for is that often what people want in an allocator is good behavior in the worst case. That is to say, it's all very well if your allocator is 99.5% faster than the default most of the time, but if it does comparatively badly when memory gets fragmented then you lose in the end, because Firefox runs for a couple of hours and then can't allocate memory any more and falls over. Then you realise why the default is taking so long over what appears to be a trivial task.
[*] This may seem harsh. Nobody cares whether it's harsh ;-)
All your implementation you are testing against is missing is checking if current size of packet is same as previously fried one:
if(size == prev_free->size)
{
current = allocate(prev_free);
return current;
}
It is "trivial" to make efficient malloc/free functions for memory until memory is not fragmented. Challenge is when you allocate lot of memory of different sizes and you try to free some and then allocate some whit no specific order.
You have to check which library you tested against and check what conditions that library was optimised for.
de-fragmented memory handling efficiency
fast free, fast malloc (you can make either one O(1) ),
memory footprint
multiprocessor support
realloc
Check existing implementations and problems they were dealing whit and try to improve or solve difficulties they had. Try to figure out what users expects from library.
Make test on this assumptions, not just some operation you think is important.
In java it's simply:
Runtime.getRuntime().freeMemory()
How to do it in C?
You can get the virtual memory limit for a process under linux using getrlimit() with the RLIMIT_AS parameter.
Your comments seem to indicate what you really want to know is the most memory that malloc can allocate in a single block. That would approximately the maximum allowed VM size minus the current VM size. The following code returns that:
#include <sys/resource.h>
#include <sys/time.h>
#include <stdint.h>
uint64_t get_total_free_mem()
{
uint64_t vm_size = 0;
FILE *statm = fopen("/proc/self/statm", "r");
if (!statm)
return 0;
if (fscanf("%ld", &vm_size) != 1)
{
flcose(statm);
return 0;
}
vm_size = (vm_size + 1) * 1024;
rlimit lim;
if (getrlimit(RLIMIT_AS, &lim) != 0)
return 0;
if (lim.rlim_cur <= vm_size)
return 0;
if (lim.rlim_cur >= 0xC000000000000000ull) // most systems cannot address more than 48 bits
lim.rlim_cur = 0xBFFFFFFFFFFFFFFFull;
return lim.rlim_cur - vm_size;
}
The only exception is that in some cases getrlimit could return 0xFFFFFFFFFFFFFFFF however most 64-bit systems cannot address more than 48 bits of address to be used, no matter what. I've accounted for that, but there may be some other edge cases I have missed. For example, 32-bit applications cannot typically allocate more than 3GB of memory, though that depends on how the kernel was built.
The real answer here is why you would want to do this. Usually the maximum amount that malloc can allocate is vastly larger than the amount the system can actually handle. When you call malloc the system will happily allocate any amount you ask for (up to the AS limit which is usually ulimited) even if no pnhysical or swap space is available. Until your program tries write to the memory (including writing 0s) with the memory is not actually allocated from physical memory or swap. When you do write to it, that's when the system will work on finding out where to get the memory from, and thats when you might run into problems.
Your best bet is to use one of the other answers that tells you how much physical memory is available, and never allocate more than that. Usually less, as you will want to leave some physical memory available for other processes and the kernel itself.