I'm reading through chapter 4 of "Learn C the Hard Way", where we start to work with valgrind.
One thing I noticed is that my very small programs are allocating 1,024 bytes:
==19896== HEAP SUMMARY:
==19896== in use at exit: 0 bytes in 0 blocks
==19896== total heap usage: 1 allocs, 1 frees, 1,024 bytes allocated
In the book, and in other people's code it shows 0 bytes allocated.
Here is the code:
#include <stdio.h>
int main(int argc, char *argv[])
{
int distance = 100;
// this is also a comment
printf("You are %d miles away.\n", distance);
return 0;
}
I don't understand why there needs to be 1kb of space allocated for this thing.
This bothers me and I would like to know what is going on.
Any help is appreciated. Thanks for your time!
Edit: 1KB, not 1MB
That's 1KB, not 1MB.. which isn't much memory these days (35 years ago, it was a lot).
As to why it's using that much: printf uses buffered I/O, which allocates buffers. The exact answer really depends upon the platform since c libraries and operating systems will vary. But if you were to write the same program using just system calls eg. write instead of printf, you'd probably see the memory usage go down.
printf buffers I/O before it actually writes that data to standard out (as #little_birdie mentioned). Buffered I/O refers to the practice of temporarily storing I/O operation in your application (user-space) prior to transmitting it to the kernel, which can be slow. In order to minimize these so called system calls, your application will ask for such memory ahead of time.
It is not uncommon for certain features of the system to disable this feature entirely, or even perhaps for a historical system to not have buffered I/O at all (although I'm not familiar with any).
If you want to disabled buffering on your stdout here (and thus allocation "0" bytes of heap memory), you can ask for it with setbuf like so:
#include <stdio.h>
int main()
{
int distance = 100;
setbuf(stdout, NULL);
printf("You are %d miles away.\n", distance);
return 0;
}
If you want to learn more about this, check out the excellent Linux Programming Interface
Related
I was wondering how to know how many memory is my program using in real time. I find the PrintMemoryInfo() fuction in win32, and i tried to implement it.
CODE
#include <windows.h>
#include <stdio.h>
#include <psapi.h>
size_t PrintMemoryInfo()
{
PROCESS_MEMORY_COUNTERS info;
GetProcessMemoryInfo( GetCurrentProcess( ), &info, sizeof(info) );
return (size_t)info.WorkingSetSize;
}
inline void printIt(){
double memory = (double)PrintMemoryInfo() * 0.001;
printf("%.2f kb\n",memory);
}
int main( void )
{
printIt();
char *i =(char*)malloc(100000);
printIt();
free(i);
printIt();
return 0;
PROCESS_MEMORY_COUNTERS contains the memory statistics of a process and GetProcessMemoryInfo fill it out with the data. I need to know the actual memory size has been using by the program, so i used WorkingSetSize for obtain that.
Microsoft definition:
WorkingSetSize
The current working set size, in bytes.
i passed from bytes to kilobytes and wieh i execute the program i obtain this:
output
1974.27 kb
2027.52 kb
2035.71 kb
this program make no sense to me:
almost 2 mb of memory for a program doing nothing?.
allocated 100 kb of memory but the program is using only 67 kb more.
if i freed the memory why this grows up.
Maybe i didn't get what this functions are doing, so can someone explain me what is going on please?
Thank you!
It is sum of :
size of execute file
stack memory memory that your program occupies the static variables that you use in your program
heap memory memory that your program takes a dynamic variable
and size of libraries that is use in program.
Program alloc more memory, for construct the code.
I have a problem with memory fragmentation which can be summarized in this small example:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[])
{
void *p[8000];int i,j;
p[0]=malloc(0x7F000);
if (p[0]==NULL)
printf("Alloc after failed!\n");
else
free(p[0]);
for (i=0;i<8000; i++) {
p[i]=malloc(0x40000);
if (p[i]==NULL){
printf("alloc failed for i=%d\n",i);
break;
}
}
for(j=0;j<i;j++) {
free(p[j]);
}
/*Alloc 1 will fail, Alloc 2 *might* fail, AlloC3 succeeds*/
p[0]=malloc(0x7F000);
if (p[0]==NULL)
printf("Alloc1 after failed!\n");
else {printf("alloc1 success\n");free(p[0]);}
p[0]=malloc(0x40000);
if (p[0]==NULL)
printf("Alloc2 after failed!\n");
else {printf("alloc2 success\n");free(p[0]);}
p[0]=malloc(0x10000);
if (p[0]==NULL)
printf("Alloc3 after failed!\n");
else {printf("alloc3 success\n");free(p[0]);}
printf("end");
}
The program prints (compiled with MSVC (both with debug and releas allocator) and MinGW on Win7):
alloc failed for i=7896
Alloc1 after failed!
alloc2 success
alloc3 success
end
Is there anyway I can avoid this? In my real application I can't avoid the scenario, my program reaches the 2GB memory limit... but I want to be able to continue by free-ing something.
Why is the fragmentation happening here in this small example in the first place? When I start to do the "free-s" why aren't the memory blocks compacted, as they should be adjacent.
Thanks!
Memory fragmentation is the consequence of allocating memory of varying sizes, each with varying lifespans. This is what creates holes in the free memory that in total are sufficient to satisfy a single allocation request, but each hole is too small by itself.
The situation in your program seems to be revealing a bug your heap management code that is not coalescing adjacent freed memory. I do expect there is a single 64 KB hole created by your allocation sequence.
To avoid this particular problem, I would simply hold on to the first allocation when I am done with it, storing it in my own "free list" so to speak. Then next time I need it, I take it from the "free list" rather than calling malloc().
Here's a small program to a college's task:
#include <unistd.h>
#ifndef BUFFERSIZE
#define BUFFERSIZE 1
#endif
main()
{
char buffer[BUFFERSIZE];
int i;
int j = BUFFERSIZE;
i = read(0, buffer, BUFFERSIZE);
while (i>0)
{
write(1, buffer, i);
i = read(0, buffer, BUFFERSIZE);
}
return 0;
}
There is a alternative using the stdio.h fread and fwrite functions instead.
Well. I compiled this both versions of program with 25 different values of Buffer Size: 1, 2, 4, ..., 2^i with i=0..30
This is a example of how I compile it:
gcc -DBUFFERSIZE=8388608 prog_sys.c -o bin/psys.8M
The question: In my machine (Ubuntu Precise 64, more details in the end) all versions of the program works fine:
./psys.1M < data
(data is a small file with 3 line ascii text.)
The problem is: When the buffer size is 8MB or greater. Both versions (using system call or clib functions) crashes with these buffer sizes (Segmentation Fault).
I tested many things. The first version of the code was like this:
(...)
main()
{
char buffer[BUFFERSIZE];
int i;
i = read(0, buffer, BUFFERSIZE);
(...)
This crashs when I call the read function. But with these versions:
main()
{
char buffer[BUFFERSIZE]; // SEGMENTATION FAULT HERE
int i;
int j = BUFFERSIZE;
i = read(0, buffer, BUFFERSIZE);
main()
{
int j = BUFFERSIZE; // SEGMENTATION FAULT HERE
char buffer[BUFFERSIZE];
int i;
i = read(0, buffer, BUFFERSIZE);
Both them crashs (SEGFAULT) in the first line of main. However, if I move the buffer out of main to the global scope (thus, allocation in the heap instead the stack), this works fine:
char buffer[BUFFERSIZE]; //NOW GLOBAL AND WORKING FINE
main()
{
int j = BUFFERSIZE;
int i;
i = read(0, buffer, BUFFERSIZE);
I use a Ubuntu Precise 12.04 64bits and Intel i5 M 480 1st generation.
#uname -a
Linux hostname 3.2.0-34-generic #53-Ubuntu SMP Thu Nov 15 10:48:16 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
I don't know the OS limitations about the stack. Is there some way to allocate big data in stack, even if this is not a good pratice?
I think it is pretty simple: if you try to make the buffer too big, it overflows the stack.
If you use malloc() to allocate the buffer, I bet you will have no problem.
Note that there is a function called alloca() that explicitly allocates stack storage. Using alloca() is pretty much the same thing as declaring a stack variable, except that it happens while your program is running. Reading the man page for alloca() or discussions of it may help you to understand what is going wrong with your program. Here's a good discussion:
Why is the use of alloca() not considered good practice?
EDIT: In a comment, #jim mcnamara told us about ulimit, a command-line tool that can check on user limits. On this computer (running Linux Mint 14, so it should be the same limits as Ubuntu 12.10) the ulimit -s command shows a stack size limit of: 8192 K-bytes, which tracks nicely with the problem as you describe it.
EDIT: In case it wasn't completely clear, I recommend that you solve the problem by calling malloc(). Another acceptable solution would be to statically allocate the memory, which will work fine as long as your code is single-threaded.
You should only use stack allocation for small buffers, precisely because you don't want to blow up your stack. If your buffers are big, or your code will recursively call a function many times such that the little buffers will be allocated many many times, you will clobber your stack and your code will crash. The worst part is that you won't get any warning: it will either be fine, or your program already crashed.
The only disadvantage of malloc() is that it is relatively slow, so you don't want to call malloc() inside time-critical code. But it's fine for initial setup; once the malloc is done, the address of your allocated buffer is just an address in memory like any other memory address within your program's address space.
I specifically recommend against editing the system defaults to make the stack size bigger, because that makes your program much less portable. If you call standard C library functions like malloc() you could port your code to Windows, Mac, Android, etc. with little trouble; if you start calling system functions to change the default stack size, you would have much more problems porting. (And you may have no plans to port this now, but plans change!)
The stack size is quite often limited in Linux. The command ulimit -s will give the current value, in Kbytes. You can change the default in (usually) the file /etc/security/limits.conf. You can also, depending on privileges, change it on a per-process basis via the code:
#include <sys/resource.h>
// ...
struct rlimit x;
if (getrlimit(RLIMIT_STACK, &x) < 0)
perror("getrlimit");
x.rlim_cur = RLIM_INFINITY;
if (setrlimit(RLIMIT_STACK, &x) < 0)
perror("setrlimit");
Right now I am using fread() to read a file, but in other language fread() is inefficient i'v been told. Is this the same in C? If so, how would faster file reading be done?
It really shouldn't matter.
If you're reading from an actual hard disk, it's going to be slow. The hard disk is your bottle neck, and that's it.
Now, if you're being silly about your call to read/fread/whatever, and say, fread()-ing a byte at a time, then yes, it's going to be slow, as the overhead of fread() will outstrip the overhead of reading from the disk.
If you call read/fread/whatever and request a decent portion of data. This will depend on what you're doing: sometimes all want/need is 4 bytes (to get a uint32), but sometimes you can read in large chunks (4 KiB, 64 KiB, etc. RAM is cheap, go for something significant.)
If you're doing small reads, some of the higher level calls like fread() will actual help you by buffering data behind your back. If you're doing large reads, it might not be helpful, but switching from fread to read will probably not yield that much improvement, as you're bottlenecked on disk speed.
In short: if you can, request a liberal amount when reading, and try to minimize what you write. For large amounts, powers of 2 tend to be friendlier than anything else, but of course, it's OS, hardware, and weather dependent.
So, let's see if this might bring out any differences:
#include <sys/time.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#define BUFFER_SIZE (1 * 1024 * 1024)
#define ITERATIONS (10 * 1024)
double now()
{
struct timeval tv;
gettimeofday(&tv, NULL);
return tv.tv_sec + tv.tv_usec / 1000000.;
}
int main()
{
unsigned char buffer[BUFFER_SIZE]; // 1 MiB buffer
double end_time;
double total_time;
int i, x, y;
double start_time = now();
#ifdef USE_FREAD
FILE *fp;
fp = fopen("/dev/zero", "rb");
for(i = 0; i < ITERATIONS; ++i)
{
fread(buffer, BUFFER_SIZE, 1, fp);
for(x = 0; x < BUFFER_SIZE; x += 1024)
{
y += buffer[x];
}
}
fclose(fp);
#elif USE_MMAP
unsigned char *mmdata;
int fd = open("/dev/zero", O_RDONLY);
for(i = 0; i < ITERATIONS; ++i)
{
mmdata = mmap(NULL, BUFFER_SIZE, PROT_READ, MAP_PRIVATE, fd, i * BUFFER_SIZE);
// But if we don't touch it, it won't be read...
// I happen to know I have 4 KiB pages, YMMV
for(x = 0; x < BUFFER_SIZE; x += 1024)
{
y += mmdata[x];
}
munmap(mmdata, BUFFER_SIZE);
}
close(fd);
#else
int fd;
fd = open("/dev/zero", O_RDONLY);
for(i = 0; i < ITERATIONS; ++i)
{
read(fd, buffer, BUFFER_SIZE);
for(x = 0; x < BUFFER_SIZE; x += 1024)
{
y += buffer[x];
}
}
close(fd);
#endif
end_time = now();
total_time = end_time - start_time;
printf("It took %f seconds to read 10 GiB. That's %f MiB/s.\n", total_time, ITERATIONS / total_time);
return 0;
}
...yields:
$ gcc -o reading reading.c
$ ./reading ; ./reading ; ./reading
It took 1.141995 seconds to read 10 GiB. That's 8966.764671 MiB/s.
It took 1.131412 seconds to read 10 GiB. That's 9050.637376 MiB/s.
It took 1.132440 seconds to read 10 GiB. That's 9042.420953 MiB/s.
$ gcc -o reading reading.c -DUSE_FREAD
$ ./reading ; ./reading ; ./reading
It took 1.134837 seconds to read 10 GiB. That's 9023.322991 MiB/s.
It took 1.128971 seconds to read 10 GiB. That's 9070.207522 MiB/s.
It took 1.136845 seconds to read 10 GiB. That's 9007.383586 MiB/s.
$ gcc -o reading reading.c -DUSE_MMAP
$ ./reading ; ./reading ; ./reading
It took 2.037207 seconds to read 10 GiB. That's 5026.489386 MiB/s.
It took 2.037060 seconds to read 10 GiB. That's 5026.852369 MiB/s.
It took 2.031698 seconds to read 10 GiB. That's 5040.119180 MiB/s.
...or no noticeable difference. (fread is winning sometimes, sometimes read)
Note: The slow mmap is surprising. This might be due to me asking it to allocate the buffer for me. (I wasn't sure about requirements of supplying a pointer...)
In really short: Don't prematurely optimize. Make it run, make it right, make it fast, that order.
Back by popular demand, I ran the test on a real file. (The first 675 MiB of the Ubuntu 10.04 32-bit desktop installation CD ISO) These were the results:
# Using fread()
It took 31.363983 seconds to read 675 MiB. That's 21.521501 MiB/s.
It took 31.486195 seconds to read 675 MiB. That's 21.437967 MiB/s.
It took 31.509051 seconds to read 675 MiB. That's 21.422416 MiB/s.
It took 31.853389 seconds to read 675 MiB. That's 21.190838 MiB/s.
# Using read()
It took 33.052984 seconds to read 675 MiB. That's 20.421757 MiB/s.
It took 31.319416 seconds to read 675 MiB. That's 21.552126 MiB/s.
It took 39.453453 seconds to read 675 MiB. That's 17.108769 MiB/s.
It took 32.619912 seconds to read 675 MiB. That's 20.692882 MiB/s.
# Using mmap()
It took 31.897643 seconds to read 675 MiB. That's 21.161438 MiB/s.
It took 36.753138 seconds to read 675 MiB. That's 18.365779 MiB/s.
It took 36.175385 seconds to read 675 MiB. That's 18.659097 MiB/s.
It took 31.841998 seconds to read 675 MiB. That's 21.198419 MiB/s.
...and one very bored programmer later, we've read the CD ISO off disk. 12 times. Before each test, the disk cache was cleared, and during each test there was enough, and approximately the same amout of, RAM free to hold the CD ISO twice in RAM.
One note of interest: I was originally using a large malloc() to fill memory and thus minimize the effects of disk caching. It may be worth noting that mmap performed terribly here. The other two solutions merely ran, mmap ran and, for reasons I can't explain, began pushing memory to swap, which killed its performance. (The program was not leaking, as far as I know (the source code is above) - the actual "used memory" stayed constant throughout the trials.)
read() posted the fastest time overall, fread() posted really consistent times. This may have been to some small hiccup during the testing, however. All told, the three methods were just about equal. (Especially fread and read...)
If you are willing to go beyond the C spec into OS specific code, memory mapping is generally considered the most efficient way.
For Posix, check out mmap and for Windows check out OpenFileMapping
What's slowing you down?
If you need the fastest possible file reading (while still playing nicely with the operating system), go straight to your OS's calls, and make sure you study how to use them most effectively.
How is your data physically laid out? For example, rotating drives might read data stored at the edges faster, and you want to minimize or eliminate seek times.
Is your data pre-processed? Do you need to do stuff between loading it from disk and using it?
What is the optimum chunk size for reading? (It might be some even multiple of the sector size. Check your OS documentation.)
If seek times are a problem, re-arrange your data on disk (if you can) and store it in larger, pre-processed files instead of loading small chunks from here and there.
If data transfer times are a problem, perhaps consider compressing the data.
I'm thinking of the read system call.
Keep in mind that fread is a wrapper for 'read'.
On the other hand fread has an internal buffer, so 'read' may be faster but i think 'fread' will be more efficient.
If fread is slow it is because of the additional layers it adds to the underlying operating system mechanism to read from a file that interfere with how your particular program is using fread. In other words, it's slow because you aren't using it the way it has been optimized for.
Having said that, faster file reading would be done by understanding how the operating system I/O functions work and providing your own abstraction that handles your program's particular I/O access patterns better. Most of the time you can do this with memory mapping the file.
However, if you are hitting the limits of the machine you are running on, memory mapping probably won't be sufficient. At that point it's really up to you to figure out how to optimize your I/O code.
It's not the fastest but it's pretty good and short.
#include <fcntl.h>
#include <unistd.h>
int main() {
int f = open("file1", O_RDWR);
char buffer[4096];
while ( read(f, buffer, 4096) > 0 ) {
printf("%s", buffer);
}
}
The problem that some people have noted here, is that depending on your source, your target buffer size, etc, you can create a custom handler for that specific case, but there are other cases, like block/character devices, i.e. /dev/* where standard rules like that do or don't apply and your backing source might be something that pops character off serially without any buffering, like an I2C bus, standard RS-232, etc. And there are some other sources where character devices are memory mappable large sections of memory like nvidia does with their video driver character device (/dev/nvidiactl).
One other design implementation that many people have chosen in high-performance applications is asynchronous instead of synchronous I/O for handling how data is read. Look into libaio, and the ported versions of libaio which provide prepackaged solutions for asynchronous I/O, as well as look into using read with shared memory between a worker and consumer thread (but keep in mind that this will increase programming complexity if you go this route). Asynchronous I/O is also something that you can't get out of the box with stdio that you can get with standard OS system calls. Just be careful as there are bits of read which are `portable' according to the spec, but not all operating systems (like FreeBSD for instance) support POSIX STREAMs (by choice).
Another thing that you can do (depending on how portable your data is) is look into compression and/or conversion into a binary format like database formats, i.e. BDB, SQL, etc. Some database formats are portable across machines using endianness conversion functions.
In general it would be best to take a set of algorithms and methods, run performance tests using the different methods, and evaluate the best algorithm that serves the mean task that your application would serve. That would help you determine what the best performing algorithm is.
Maybe check out how perl does it. Perl's I/O routines are optimized, and are, I gather, the reason why processing text with a perl filter can be twice as fast as doing the same transformation with sed.
Obviously perl is pretty complex, and I/O is only one small part of what it does. I've never looked at its source so I couldn't give you any better directions than to point you here.
I would like to write a program to consume all the memory available to understand the outcome. I've heard that linux starts killing the processes once it is unable to allocate the memory.
Can anyone help me with such a program.
I have written the following, but the memory doesn't seem to get exhausted:
#include <stdlib.h>
int main()
{
while(1)
{
malloc(1024*1024);
}
return 0;
}
You should write to the allocated blocks. If you just ask for memory, linux might just hand out a reservation for memory, but nothing will be allocated until the memory is accessed.
int main()
{
while(1)
{
void *m = malloc(1024*1024);
memset(m,0,1024*1024);
}
return 0;
}
You really only need to write 1 byte on every page (4096 bytes on x86 normally) though.
Linux "over commits" memory. This means that physical memory is only given to a process when the process first tries to access it, not when the malloc is first executed. To disable this behavior, do the following (as root):
echo 2 > /proc/sys/vm/overcommit_memory
Then try running your program.
Linux uses, by default, what I like to call "opportunistic allocation". This is based on the observation that a number of real programs allocate more memory than they actually use. Linux uses this to fit a bit more stuff into memory: it only allocates a memory page when it is used, not when it's allocated with malloc (or mmap or sbrk).
You may have more success if you do something like this inside your loop:
memset(malloc(1024*1024L), 'w', 1024*1024L);
In my machine, with an appropriate gb value, the following code used 100% of the memory, and even got memory into the swap.
You can see that you need to write only one byte in each page: memset(m, 0, 1);,
If you change the page size: #define PAGE_SZ (1<<12) to a bigger page size: #define PAGE_SZ (1<<13) then you won't be writing to all the pages you allocated, thus you can see in top that the memory consumption of the program goes down.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define PAGE_SZ (1<<12)
int main() {
int i;
int gb = 2; // memory to consume in GB
for (i = 0; i < ((unsigned long)gb<<30)/PAGE_SZ ; ++i) {
void *m = malloc(PAGE_SZ);
if (!m)
break;
memset(m, 0, 1);
}
printf("allocated %lu MB\n", ((unsigned long)i*PAGE_SZ)>>20);
getchar();
return 0;
}
A little known fact (though it is well documented) - you can (as root) prevent the OOM killer from claiming your process (or any other process) as one of its victims. Here is a snippet from something directly out of my editor, where I am (based on configuration data) locking all allocated memory to avoid being paged out and (optionally) telling the OOM killer not to bother me:
static int set_priority(nex_payload_t *p)
{
struct sched_param sched;
int maxpri, minpri;
FILE *fp;
int no_oom = -17;
if (p->cfg.lock_memory)
mlockall(MCL_CURRENT | MCL_FUTURE);
if (p->cfg.prevent_oom) {
fp = fopen("/proc/self/oom_adj", "w");
if (fp) {
/* Don't OOM me, Bro! */
fprintf(fp, "%d", no_oom);
fclose(fp);
}
}
I'm not showing what I'm doing with scheduler parameters as its not relevant to the question.
This will prevent the OOM killer from getting your process before it has a chance to produce the (in this case) desired effect. You will also, in effect, force most other processes to disk.
So, in short, to see fireworks really quickly...
Tell the OOM killer not to bother you
Lock your memory
Allocate and initialize (zero out) blocks in a never ending loop, or until malloc() fails
Be sure to look at ulimit as well, and run your tests as root.
The code I showed is part of a daemon that simply can not fail, it runs at a very high weight (selectively using the RR or FIFO scheduler) and can not (ever) be paged out.
Have a look at this program.
When there is no longer enough memory malloc starts returning 0
#include <stdlib.h>
#include <stdio.h>
int main()
{
while(1)
{
printf("malloc %d\n", (int)malloc(1024*1024));
}
return 0;
}
On a 32-bit Linux system, the maximum that a single process can allocate in its address space is approximately 3Gb.
This means that it is unlikely that you'll exhaust the memory with a single process.
On the other hand, on 64-bit machine you can allocate as much as you like.
As others have noted, it is also necessary to initialise the memory otherwise it does not actually consume pages.
malloc will start giving an error if EITHER the OS has no virtual memory left OR the process is out of address space (or has insufficient to satisfy the requested allocation).
Linux's VM overcommit also affects exactly when this is and what happens, as others have noted.
I just exexuted #John La Rooy's snippet:
#include <stdlib.h>
#include <stdio.h>
int main()
{
while(1)
{
printf("malloc %d\n", (int)malloc(1024*1024));
}
return 0;
}
but it exhausted my memory very fast and cause the system hanged so that I had to restart it.
So I recommend you change some code.
For example:
On my ubuntu 16.04 LTS the code below takes about 1.5 GB ram, physical memory consumption raised from 1.8 GB to 3.3 GB after executing and go down back to 1.8GiB after finishing execution.Though it looks like I have allocate 300GiB ram in the code.
#include <stdlib.h>
#include <stdio.h>
int main()
{
while(int i<300000)
{
printf("malloc %p\n", malloc(1024*1024));
i += 1;
}
return 0;
}
When index i is less then 100000(ie, allocate less than 100 GB), either physical or virtual memory are just very slightly used(less then 100MB), I don't know why, may be there is something to do with virtual memory.
One thing interesting is that when the physical memory begins to shrink, the addresses malloc() returns definitely changes, see picture link below.
I used malloc() and calloc(), seems that they behave similarily in occupying physical memory.
memory address number changes from 48 bits to 28 bits when physical memory begins shrinking
I was bored once and did this. Got this to eat up all memory and needed to force a reboot to get it working again.
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char** argv)
{
while(1)
{
malloc(1024 * 4);
fork();
}
}
If all you need is to stress the system, then there is stress tool, which does exactly what you want. It's available as a package for most distros.