Why is the size of my HeapOverflow program 5MB? - heap-memory

So I was asking myself what would happen if I tried to do a heap overflow on Windows XP, and I was surprise to see that, once the program "ate" all the RAM (this happens instantly, by the way), the size of the process in the task manager goes down to 5MB and doesn't move afterwards. The computer memory usage is still growing, however.
So why is Windows not able to see that my software takes GB of memory ? I feel like it can be a security problem because once a software ate all the memory, it can "hide" in the small process groups (and maybe I'm a little bit paranoid).
Note : nothing happens when the heap is full, the cpu just jumps to 100% because my for(;;) loop runs like crazy once malloc fails.
Edit : Ok! Never knew that you could tweak the task manager columns. I learnt something today :D.

Interesting experiment .. by default the Task Manager shows default working set. There are other memory fields, such as Paged and Unpaged pools and Working sets. Page faults can also tell you that the program is trying allocate memory but failing.

Try opening Task Manager and going to View > Select Columns... then toggle on more of the memory columns. It may well be that the program is using far more memory but not of the type that you are viewing in Task Manager
I think under XP there may be a Virtual Memory column which will be of interest to you

Related

OSDev: Why my memory allocation function suddenly stops working in the AHCI initialization function?

After my kernel calls the AHCIInit() function inside of the ArchInit() function, I get a page fault in one of the MemAllocate() calls, and this only happens in real machines, as I tried replicating it on VirtualBox, VMWare and QEMU.
I tried debugging the code, unit testing the memory allocator and removing everything from the kernel, with the exception from the memory manager and the AHCI driver itself, the only thing that I discovered is that something is corrupting the allocation blocks, making the MemAllocate() page fault.
The whole kernel source is at https://github.com/CHOSTeam/CHicago-Kernel, but the main files where the problem probably occours are:
https://github.com/CHOSTeam/CHicago-Kernel/blob/master/mm/alloc.c
https://github.com/CHOSTeam/CHicago-Kernel/blob/master/arch/x86/io/ahci.c
I expected the AHCIInit() to detect and initialize all the AHCI devices and the boot to continues until it reaches the session manager or the kernel shell, but in real computers it page faults before even initializing the scheduler (so no, the problem isn't my scheduler).
If it works in emulators but doesn't work on real hardware; then the first things I'd suspect are:
bugs in physical memory management. For example, physical memory manager initialization not rounding "starting address of usable RAM area" up to a page boundary or not rounding "ending address of usable RAM area" down to a page boundary, causing a "half usable RAM and half not usable RAM" page to be allocated by the heap later (where it works on emulators because the memory map provided by firmware happens to describe areas that are nicely aligned anyway).
a bug where RAM is assumed to contain zeros but may not be (where it works on emulators because they tend to leave almost all RAM full of zeros).
a race condition (where different timing causes different behavior).
However; this is a monolithic kernel, which means that you'll continually be facing "any piece of code in kernel-space did something that caused problems for any other piece of code anywhere else"; and there's a bunch of common bugs with memory usage (e.g. accidentally writing past the end of what you allocated). For this reason I'd want better tools to help diagnose problems, especially for heap.
Specifically, for the heap I'd start with canaries (e.g. put a magic number like 0xFEEDFACE before each block of memory in the heap, and another different number after each block of memory in the heap; and then check that the magic numbers are still present and correct where convenient - e.g. when blocks are freed or resized). Then I'd write a "check_heap()" function that scans through everything checking as much as possible (the canaries, if statistics like "number of free blocks" are actually correct, etc). The idea being that (whenever you suspect something might have corrupted the heap) you can insert a call to the "check_heap()" function, and move that call around until you find out which piece of code caused the heap corruption. I'd also suggest having a "what" parameter in your "kmalloc() or equivalent" (e.g. so you can do things like myFooStructure = kmalloc("Foo Structure", sizeof(struct foo));), where the provided "what string" is stored in the allocated block's meta-data, so that later on (when you find out the heap was corrupted) you can display the "what string" associated with the block before the corruption, and so that you can (e.g.) list how many of each type of thing there currently is to help determine what is leaking memory (e.g. if the number of "Foo Structure" blocks is continually increasing). Of course these things can be (should be?) enabled/disabled by compile time options (e.g. #ifdef DEBUG_HEAP).
The other thing I'd recommend is self tests. These are like unit tests, but built directly into the kernel itself and always present. For example, you could write code to pound the daylights out of the heap (e.g. allocate random sized pieces of memory and fill them with something until you run out of memory, then free half of them, then allocate more until you run out of memory again, etc; while calling the "check_heap()" function between each step); where this code could/should take a "how much pounding" parameter (so you could spend a small amount of time doing the self test, or a huge amount of time doing the self test). You could also write code to pound the daylights out of the virtual memory manager, and the physical memory manager (and the scheduler, and ...). Then you could decide to always do a small amount of self testing each time the kernel boots and/or provide a special kernel parameter/option to enable "extremely thorough self test mode".
Don't forget that eventually (if/when the OS is released) you'll probably have to resort to "remote debugging via. email" (e.g. where someone without any programming experience, who may not know very much English, sends you an email saying "OS not work"; and you have to try to figure out what is going wrong before the end user's "amount of hassle before giving up and not caring anymore" counter is depleted).

Investigating a potential memory leak

Performance graph
I'm unsure if my code contains a memory leak or not. In order to verify this, I'm running it in an infinite loop which gives the performance graph shown here. The yellow plot shows the process' memory usage, and the magenta plot is the process' I/O (I'm outputting the loop's iteration number at every cycle). This process is running in Windows 8.1, and I used Qt Creator to compile it, in Release Mode.
The initial data point of the yellow plot is 732KB and the final one shows 924KB (this is after some 3000 iterations of the loop), data points are taken every second here.
It's a little hard to tell but the memory usage seems to be following a log progression with sudden jumps, i.e. it plateaus and suddenly jumps by about 10KB after a few hundred iterations. Moreover, looking at the I/O graph, it also seems that the execution speed of the loop is uneven.
This behavior seems a little odd because the memory leaks I'm used to dealing with show a clear linear memory usage over time. I'm not exactly sure what to make of this, I'm assuming it is either a very small memory leak or it could it be the OS allocating more memory to the process in order to optimize its performance? Any advice?

madvise: not understood

CONTEXT:
I run on an old laptop. I only just have 128Mo ram free on 512Mo total. No money to buy more ram.
I use mmap to help me circumvent this issue and it works quite well.
C code.
Debian 64 bits.
PROBLEM:
Besides all my efforts, I am running out of memory pretty quick right know and I would like to know if I could release the mmaped regions I read to free my ram.
I read that madvise could help, especially the option MADV_SEQUENTIAL.
But I don't quite understand the whole picture.
THE NEED:
To be able to free mmaped allocated memory after the region is read so that it doesn't fill my whole ram with large files. I will not read it soon so it is garbage to me. It is pointless to keep it in ram.
Update: I am not done with the file so don't want to call munmap. I have other stuffs to do with it but in another regions of it. Random reads.
For random read/write access to a mmap()ed file, MADV_SEQUENTIAL is probably not very useful (and may in fact cause undesired behavior). MADV_RANDOM or MADV_DONTNEED would be better options in this case. However, be aware that the kernel is free to ignore any madvise() - although in my understanding, Linux currently does not, as it tends to treat madvise() more as a command than an advisory...
Another option would be to mmap() only selected sections of the file as needed, and munmap() them as you're done with them, perhaps maintaining a pool of some small number of currently active mappings (i.e. mapping more than one region at once if needed, but still keeping it limited).
Or course you must free resources when you're done with them in order not to leak them and thus run out of available space too soon.
Not sure what the question is, if you know about mmap() then surely you know about munmap() too? It's right there on the same manual page.

Yet another Memory Leak Issue (memory is still gone when program terminates)- C program on SLES

I run my C program on Suse Linux Enterprise that compresses several thousand large files (between 10MB and 100MB in size), and the program gets slower and slower as the program runs (it's running multi-threaded with 32 threads on a Intel Sandy Bridge board). When the program completes, and it's run again, it's still very slow.
When I watch the program running, I see that the memory is being depleted while the program runs, which you would think is just a classic memory leak problem. But, with a normal malloc()/free() mismatch, I would expect all the memory to return when the program terminates. But, most of the memory doesn't get reclaimed when the program completes. The free or top command shows Mem: 63996M total, 63724M used, 272M free when the program is slowed down to a halt, but, after the termination, the free memory only grows back to about 3660M. When the program is rerun, the free memory is quickly used up.
The top program only shows that the program, while running, is using at most 4% or so of the memory.
I thought that it might be a memory fragmentation problem, but, I built a small test program that simulates all the memory allocation activity in the program (many randomized aspects were built in - size/quantity), and it always returns all the memory upon completion. So, I don't think that's it.
Questions:
Can there be a malloc()/free() mismatch that will lose memory permanently, i.e. even after the process completes?
What other things in a C program (not C++) can cause permanent memory loss, i.e. after the program completes, and even the terminal window closes? Only a reboot brings the memory back. I've read other posts about files not being closed causing problems, but, I don't think I have that problem.
Is it valid to be looking at top and free for the memory statistics, i.e. do they accurately describe the memory situation? They do seem to correspond to the slowness of the program.
If the program only shows a 4% memory usage, will something like valgrind find this problem?
Can there be a malloc()/free() mismatch that will lose memory permanently, i.e. even after the process completes?
No, malloc and free, and even mmap are harmless in this respect, and when the process terminates the OS (SUSE Linux in this case) claims all their memory back (unless it's shared with some other process that's still running).
What other things in a C program (not C++) can cause permanent memory loss, i.e. after the program completes, and even the terminal window closes? Only a reboot brings the memory back. I've read other posts about files not being closed causing problems, but, I don't think I have that problem.
Like malloc/free and mmap, files opened by the process are automatically closed by the OS.
There are a few things which cause permanent memory leaks like big pages but you would certainly know about it if you were using them. Apart from that, no.
However, if you define memory loss as memory not marked 'free' immediately, then a couple of things can happen.
Writes to disk or mmap may be cached for a while in RAM. The OS must keep the pages around until it synchs them back to disk.
Files READ by the process may remain in memory if the OS has nothing else to use that RAM for right now - on the reasonable assumption that it might need them soon and it's quicker to read the copy that's already in RAM. Again, if the OS or another process needs some of that RAM, it can be discarded instantly.
Note that as someone who paid for all my RAM, I would rather the OS used ALL of it ALL the time, if it helps in even the smallest way. Free RAM is wasted RAM.
The main problem with having little free RAM is when it is overcommitted, which is to say there are more processes (and the OS) asking for or using RAM right now than is available on the system. It sounds like you are using about 4Gb of RAM in your processes, which might be a problem - (and remember the OS needs a good chunk too. But it sounds like you have plenty of RAM! Try running half the number of processes and see if it gets better.
Sometimes a memory leak can cause temporary overcommitment - it's a good idea to look into that. Try plotting the memory use of your program over time - if it rises continuously, then it may well be a leak.
Note that forking a process creates a copy that shares the memory the original allocated - until both are closed or one of them 'exec's. But you aren't doing that.
Is it valid to be looking at top and free for the memory statistics, i.e. do they accurately describe the memory situation? They do seem to correspond to the slowness of the program.
Yes, top and ps are perfectly reasonable ways to look at memory, in particular observe the RES field. Ignore the VIRT field for now. In addition:
To see what the whole system is doing with memory, run:
vmstat 10
While your program is running and for a while after. Look at what happens to the ---memory--- columns.
In addition, after your process has finished, run
cat /proc/meminfo
And post the results in your question.
If the program only shows a 4% memory usage, will something like valgrind find this problem?
Probably, but it can be extremely slow, which might be impractical in this case. There are plenty of other tools which can help such as electricfence and others which do not slw your program down noticeably. I've even rolled my own in the past.
malloc()/free() work on the heap. This memory is guaranteed to be released to the OS when the process terminates. It is possible to leak memory even after the allocating process terminates using certain shared memory primitives (e.g. System V IPC). However, I don't think any of this is directly relevant.
Stepping back a bit, here's output from a lightly-loaded Linux server:
$ uptime
03:30:56 up 72 days, 8:42, 2 users, load average: 0.06, 0.17, 0.27
$ free -m
total used free shared buffers cached
Mem: 24104 23452 652 0 15821 978
-/+ buffers/cache: 6651 17453
Swap: 3811 5 3806
Oh no, only 652 MB free! Right? Wrong.
Whenever Linux accesses a block device (say, a hard drive), it looks for any unused memory, and stores a copy of the data there. After all, why not? The data's already in RAM, some program clearly wanted that data, and RAM that's unused can't do anyone any good. If a program comes along and asks for more memory, the cached data is discarded to make room -- until then, might as well hang onto it.
The key to this free output is not the first line, but the second. Yes, 23.4 GB of RAM is being used -- but 17.4 GB is available for programs that want it. See Help! Linux ate my RAM! for more.
I can't say why the program is getting slower, but having the "free memory" metric steadily drop down to nothing is entirely normal and not the cause.
The operating system only makes as much memory free as it absolutely needs. Making memory free is wasted effort if the memory is later used normally -- it's more efficient to just directly transition the memory from one use to another than to make the memory free just to have to make it unfree later.
The only thing the system needs free memory for is operations that require memory that can't switch used memory from one purpose to another. This is a very small set of unusual operations such as servicing network interrupts.
If you type this command sysctl vm.min_free_kbytes, the system will tell you the number of KB it needs free. It's likely less than 100MB. So having any amount more than that free is perfectly fine.
If you want more of your memory free, remove it from the computer. Otherwise, the operating system assumes that there is zero cost to using it, and thus zero benefit to making it free.
For example, consider the data you wrote to disk. The operating system could make the memory that was holding that data free. But that's a double loss. If the data you wrote to disk is later read, it will have to read it from disk rather than just grabbing it from memory. And if that memory is later needed for some other purpose, it will just have to undo all the work it went through making it free. Yuck. So if the system doesn't absolutely need free memory, it won't make it free.
My guess would be the problem is not in your program, but in the operating system. The OS keeps a cache of recently used files in memory on the assumption that you are going to access them again. It does not know with certainty what files are going to be needed, so it can end up deciding to keep the wrong ones at the expense of the ones you wish it was keeping.
It may be keeping the output files of the first run cached when you do your second run, which prevents it from effectively using the cache on the second run. You can test this theory by deleting all files from the first run (which should free them from cache) and seeing if that makes the second run go faster.
If that doesn't work, try deleting all the input files for the first run as well.
Answers
Yes there is no requirement in C or C++ to release memory that is not freed back to the OS
Do you have memory mapped files, open file handles for deleted files etc. Linux will not delete a file until all references to is a deallocated. Also linux will cache the file in memory in case it needs to be read again - file cache memory usage can be ignored as the OS will deal with it
No
Maybe valgrind will highlight cases where memory is not

C program stuck on uninterruptible wait while performing disk I/O on Mac OS X Snow Leopard

One line of background: I'm the developer of Redis, a NoSQL database. One of the new features I'm implementing is Virtual Memory, because Redis takes all the data in memory. Thanks to VM Redis is able to transfer rarely used objects from memory to disk, there are a number of reasons why this works much better than letting the OS do the work for us swapping (redis objects are built of many small objects allocated in non contiguous places, when serialized to disk by Redis they take 10 times less space compared to the memory pages where they live, and so forth).
Now I've an alpha implementation that's working perfectly on Linux, but not so well on Mac OS X Snow Leopard. From time to time, while Redis tries to move a page from memory to disk, the redis process enters the uninterruptible wait state for minutes. I was unable to debug this, but this happens either in a call to fseeko() or fwrite(). After minutes the call finally returns and redis continues working without problems at all: no crash.
The amount of data transfered is very small, something like 256 bytes. So it should not be a matter of a very big amount of I/O performed.
But there is an interesting detail about the swap file that's target of the write operation. It's a big file (26 Gigabytes) created opening a file with fopen() and then enlarged using ftruncate(). Finally the file is unlink()ed so that Redis continues to take a reference to it, but we are sure that when the Redis process will exit the OS will really free the swap file.
Ok that's all but I'm here for any further detail. And BTW you can even find the actual code in the Redis git, but it's not trivial to understand in five minutes given that's a fairly complex system.
Thank you very much for any help.
As I understand it, HFS+ has very poor support for sparse files. So it may be that your write is triggering a file expansion that is initializing/materializing a large fraction of the file.
For example, I know mmap'ing a new large empty file and then writing at a few random locations produces a very large file on disk with HFS+. It's quite annoying since mmap and sparse files are an extremely convenient way of working with data, and virtually every other platform/filesystem out there handles this gracefully.
Is the swap file written to linearly? Meaning we either replace an existing block or write a new block at the end and increment a free space pointer? If so, perhaps doing more frequent smaller ftruncate calls to expand the file would result in shorter pauses.
As an aside, I'm curious why redis VM doesn't use mmap and then just move blocks around in an attempt to concentrate hot blocks into hot pages.
antirez, I'm not sure I'll be much help since my Apple experience is limited to the Apple ][, but I'll give it a shot.
First thing is a question. I would have thought that, for virtual memory, speed of operation would be a more important measure than disk space (especially for a NoSQL DB where speed is the whole point, otherwise you'd be using SQL, no?). But, if your swap file is 26G, maybe not :-)
Some things to try (if possible).
Try to actually isolate the problem to the seek or write. I have a hard time believing a seek could take that long since, at worst, it should be a buffer pointer change. Still, I didn't write OSX so I can't be sure.
Try adjusting the size of the swap file to see if that's what is causing the problem.
Do you ever dynamically expand the swap file (as opposed to pre-allocation)? If you do, that may be what is causing the problem.
Do you always write as low in the file as you can? It may be that creating a 26G file may not actually fill it with data but, if you create it then write to the last byte, the OS may have to zero out the bytes before then (deferring the initialization, if any).
What happens if you just pre-allocate the entire file (write to every byte) and not unlink it? In other words, leave the file there between runs of your program (creating it if it doesn't already exist of course). Then in your startup code for Redis, just initialize the file (pointers and such). This may get rid of any problems like those in point 4 above.
Ask on the various BSD sites as well. I'm not sure how much Apple changed under the covers but OSX is just BSD at the lowest level (Pax ducks for cover).
Also consider asking on the Apple sites (if you haven't already done so).
Well, that's my small contribution, hopefully it'll help. Good luck with your project.
Have you turned off file caching for your file? i.e. fcntl(fd, F_GLOBAL_NOCACHE, 1)
Have you tried debugging with DTrace and or Instruments (Apple's experimental dtrace front-end)?
Exploring Leopard with DTrace
Debugging Chrome on OS X
As Linus said once on the Git mailing list:
"I realize that OS X people have a hard time accepting it, but OS X
filesystems are generally total and utter crap - even more so than
Windows."

Resources