The scenario is that I'm running a test program which grows to 8GB in size, and then a series of asserts are done via Check.
I can see my process grow to 8GB in size in 'top' and I see my system memory grow accordingly. However, when a large number of asserts are done afterwards, I see VIRT stop growing as expected, and I see my total used memory continue to increase as Check is going through the asserts. So according to top there's no more memory being given to any processes, but I something is still chewing through memory.
I sort top by memory usage and see nothing else is reserving memory. Eventually I hit 100% swap and physical memory usage and the process gets killed.
Note that Check will fork each 'test' into its own process. When I break the test program (w/ ctrl+c), I still see the process in top, and its VIRT still reads 8GB.
I believe memory is being eaten by Check, because this behavior doesn't happen when I take out all the asserts. I saw in Check that a tmpfile() is created and used to track the last assert that happened and I see /tmp growing as the assert phase begins. If I modify Check code to write to a file in /tmp (instead of using tmpfile()), I see that file grows to be GBs big.
1) Why doesn't the virtual address space being taken up by the open file show up as part of the process' used memory? Note that Check forks off each 'test'. Also, even though swap is full, shouldn't unused parts of the file just get paged back out to memory? (writes are done via fwrite, not mmap)
2) A secondary question, I haven't used tmpfile() before, but why doesn't any file show up in /tmp when tmpfile() is invoked? If it is because the file is unlinked immediately, does that mean any unlinked file won't show up in the filesystem? (My understanding of what unlink does is also rudimentary).
edit: I'm using Arch Linux w/ kernel 4.0.5-1 and procps-ng version 3.3.10
A small experiment shows that yes, because tmpfile is created in /tmp, it is held purely in memory, and because I was running out of memory, my program got killed. Because the program had the last open file descriptor, the file is deleted(freed out of my tmpfs) after the program was killed. Because /tmp exists only in memory as a tmpfs, it can't be 'paged out' to disk because the filesystem itself exists only in memory, and swap was also full, hence the OOM error.
By performing the same experiment with a file in /var/tmp, which is instead on disk, no extra memory was being taken up so I didn't get this error.
While I couldn't view this behavior with top, free -h showed the memory being taken up under shared and buff/cache.
To narrow down the exact culprit, I used:
lsof | awk '$5 == "REG"' | sort -n -r -k 7,7 | head -n 5
from https://serverfault.com/questions/207100/how-can-i-find-phantom-storage-usage
Note that the above will not work if /tmp is already full.
So the open file was not the problem, i.e. buffering was not the problem. The problem was that the file existed in a tmpfs.
To answer my #2, the file doesn't show up because it was 'deleted' when it was unlinked with tmpfile():
$ lsof | awk '$5 == "REG"' | sort -n -r -k 7,7 | head -n 5
toy 13624 me 3u REG 0,33 4294963200 12505948 /tmp/tmpf0W4HOM (deleted)
Related
stat() system call is taking long time when I am trying to do a stat on a file which is corrupted. Magic number is corrupted.
I have a print after this call in my source code which is getting printed after some delay.
I am not sure if stat() is doing any retry on the call. If any documentation available please share it. It would be great help.
It returned input output error. Error no 5 EIO. So i am not sure if the file or the filesystem is corrupted
This can be caused by bad blocks on an aging or damaged spinning disk. There are two other symptoms that will likely occur concurrently:
Copious explicit I/O errors reported by the kernel in the system logs.
A sudden spike in load average. This happens because processes which are stuck waiting on I/O are in uninterrupted sleep while the kernel busy loops in an attempt to interact with the hardware, causing the system to become sluggish temporarily. You cannot stop this from happening, or kill processes in uninterrupted sleep. It's a sort of OS Achille's heel.
If this is the case, unmount the filesystems involved and run e2fsck -c -y on them. If it is the root filesystem, you will need to, e.g., boot the system with a live CD and do it from there. From man e2fsck:
-c
This option causes e2fsck to use badblocks(8) program to do a read-only scan of the device in
order to find any bad blocks. If any bad blocks are found, they are added to the bad block
inode to prevent them from being allocated to a file or directory. If this option is specified twice, then the bad block scan will be done using a non-destructive read-write test.
Note that -cc takes a long time; -c should be sufficient. -y answers yes automatically to all questions, which you might as well do since there may be a lot of those.
You will probably lose some data (have a look in /lost+found afterward); hopefully the system still boots. At the very least, the filesystems are now safe to mount. The disk itself may or may not last a while longer. I've done this and had them remain fine for months more, but don't count on it.
If this is a SMART drive, there are apparently some other tools you can use to diagnose and deal with the same problem, although what I've outlined here is probably good enough.
The memory usage of a process can be displayed by running:
$ ps -C processname -o size
SIZE
3808
Is there any way to retrieve this information without executing ps (or any external program), or reading /proc?
On a Linux system, a process' memory usage can be queried by reading /proc/[pid]/statm. Where [pid] is the PID of the process. If a process wants to query its own data, it can do so by reading /proc/self/statm instead. man 5 proc says:
/proc/[pid]/statm
Provides information about memory usage, measured in pages. The
columns are:
size total program size
(same as VmSize in /proc/[pid]/status)
resident resident set size
(same as VmRSS in /proc/[pid]/status)
share shared pages (from shared mappings)
text text (code)
lib library (unused in Linux 2.6)
data data + stack
dt dirty pages (unused in Linux 2.6)
You could just open the file with: fopen("/proc/self/statm", "r") and read the contents.
Since the file returns results in 'pages', you will want to find the page size also. getpagesize () returns the size of a page, in bytes.
You have a few options to do find the memory usage of a program:
Run it within a profiler like Valgrind or memprof.
exec/proc_open/fork a new process to use ps, top, or pmap as you would from the command line
bundle the ps into your app and use it directly (it's open source, of course!)
Use the /proc system (which is all that ps does, anyways...)
Create a report the kernel, which watches over process memory operations. The /proc filesystem is just a view into the kernel's internal data structures, so this is really already done for you.
Develop your own mechanism to compute memory usage without kernel assistance.
The former are all educational from a system administration perspective, and would be the best options in a real-life situation, but the last bullet point is probably the most interesting. You'd probably want to read the source of Valgrind or memprof to see how it works, but essentially what you'd need to do is insert your mechanism between the app and the kernel, and intercept any requests for memory allocation. Additionally, when the process started, you would want to initialize its memory space with a preset value like 0xDEADBEEF. Then, after the process finished, you could read through the memory space and count the occurrences of words other than your preset value, giving you an estimate of memory usage.
Of course, things are always more complicated than they seem. What about memory used by shared libraries? Pipes? Shared memory between your processes and another? System calls? Virtual memory allocated but not used? Data buffered to the disk? There's a lot of calls to be made beyond your question 'memory of process', see this post for some additional concerns.
I run my C program on Suse Linux Enterprise that compresses several thousand large files (between 10MB and 100MB in size), and the program gets slower and slower as the program runs (it's running multi-threaded with 32 threads on a Intel Sandy Bridge board). When the program completes, and it's run again, it's still very slow.
When I watch the program running, I see that the memory is being depleted while the program runs, which you would think is just a classic memory leak problem. But, with a normal malloc()/free() mismatch, I would expect all the memory to return when the program terminates. But, most of the memory doesn't get reclaimed when the program completes. The free or top command shows Mem: 63996M total, 63724M used, 272M free when the program is slowed down to a halt, but, after the termination, the free memory only grows back to about 3660M. When the program is rerun, the free memory is quickly used up.
The top program only shows that the program, while running, is using at most 4% or so of the memory.
I thought that it might be a memory fragmentation problem, but, I built a small test program that simulates all the memory allocation activity in the program (many randomized aspects were built in - size/quantity), and it always returns all the memory upon completion. So, I don't think that's it.
Questions:
Can there be a malloc()/free() mismatch that will lose memory permanently, i.e. even after the process completes?
What other things in a C program (not C++) can cause permanent memory loss, i.e. after the program completes, and even the terminal window closes? Only a reboot brings the memory back. I've read other posts about files not being closed causing problems, but, I don't think I have that problem.
Is it valid to be looking at top and free for the memory statistics, i.e. do they accurately describe the memory situation? They do seem to correspond to the slowness of the program.
If the program only shows a 4% memory usage, will something like valgrind find this problem?
Can there be a malloc()/free() mismatch that will lose memory permanently, i.e. even after the process completes?
No, malloc and free, and even mmap are harmless in this respect, and when the process terminates the OS (SUSE Linux in this case) claims all their memory back (unless it's shared with some other process that's still running).
What other things in a C program (not C++) can cause permanent memory loss, i.e. after the program completes, and even the terminal window closes? Only a reboot brings the memory back. I've read other posts about files not being closed causing problems, but, I don't think I have that problem.
Like malloc/free and mmap, files opened by the process are automatically closed by the OS.
There are a few things which cause permanent memory leaks like big pages but you would certainly know about it if you were using them. Apart from that, no.
However, if you define memory loss as memory not marked 'free' immediately, then a couple of things can happen.
Writes to disk or mmap may be cached for a while in RAM. The OS must keep the pages around until it synchs them back to disk.
Files READ by the process may remain in memory if the OS has nothing else to use that RAM for right now - on the reasonable assumption that it might need them soon and it's quicker to read the copy that's already in RAM. Again, if the OS or another process needs some of that RAM, it can be discarded instantly.
Note that as someone who paid for all my RAM, I would rather the OS used ALL of it ALL the time, if it helps in even the smallest way. Free RAM is wasted RAM.
The main problem with having little free RAM is when it is overcommitted, which is to say there are more processes (and the OS) asking for or using RAM right now than is available on the system. It sounds like you are using about 4Gb of RAM in your processes, which might be a problem - (and remember the OS needs a good chunk too. But it sounds like you have plenty of RAM! Try running half the number of processes and see if it gets better.
Sometimes a memory leak can cause temporary overcommitment - it's a good idea to look into that. Try plotting the memory use of your program over time - if it rises continuously, then it may well be a leak.
Note that forking a process creates a copy that shares the memory the original allocated - until both are closed or one of them 'exec's. But you aren't doing that.
Is it valid to be looking at top and free for the memory statistics, i.e. do they accurately describe the memory situation? They do seem to correspond to the slowness of the program.
Yes, top and ps are perfectly reasonable ways to look at memory, in particular observe the RES field. Ignore the VIRT field for now. In addition:
To see what the whole system is doing with memory, run:
vmstat 10
While your program is running and for a while after. Look at what happens to the ---memory--- columns.
In addition, after your process has finished, run
cat /proc/meminfo
And post the results in your question.
If the program only shows a 4% memory usage, will something like valgrind find this problem?
Probably, but it can be extremely slow, which might be impractical in this case. There are plenty of other tools which can help such as electricfence and others which do not slw your program down noticeably. I've even rolled my own in the past.
malloc()/free() work on the heap. This memory is guaranteed to be released to the OS when the process terminates. It is possible to leak memory even after the allocating process terminates using certain shared memory primitives (e.g. System V IPC). However, I don't think any of this is directly relevant.
Stepping back a bit, here's output from a lightly-loaded Linux server:
$ uptime
03:30:56 up 72 days, 8:42, 2 users, load average: 0.06, 0.17, 0.27
$ free -m
total used free shared buffers cached
Mem: 24104 23452 652 0 15821 978
-/+ buffers/cache: 6651 17453
Swap: 3811 5 3806
Oh no, only 652 MB free! Right? Wrong.
Whenever Linux accesses a block device (say, a hard drive), it looks for any unused memory, and stores a copy of the data there. After all, why not? The data's already in RAM, some program clearly wanted that data, and RAM that's unused can't do anyone any good. If a program comes along and asks for more memory, the cached data is discarded to make room -- until then, might as well hang onto it.
The key to this free output is not the first line, but the second. Yes, 23.4 GB of RAM is being used -- but 17.4 GB is available for programs that want it. See Help! Linux ate my RAM! for more.
I can't say why the program is getting slower, but having the "free memory" metric steadily drop down to nothing is entirely normal and not the cause.
The operating system only makes as much memory free as it absolutely needs. Making memory free is wasted effort if the memory is later used normally -- it's more efficient to just directly transition the memory from one use to another than to make the memory free just to have to make it unfree later.
The only thing the system needs free memory for is operations that require memory that can't switch used memory from one purpose to another. This is a very small set of unusual operations such as servicing network interrupts.
If you type this command sysctl vm.min_free_kbytes, the system will tell you the number of KB it needs free. It's likely less than 100MB. So having any amount more than that free is perfectly fine.
If you want more of your memory free, remove it from the computer. Otherwise, the operating system assumes that there is zero cost to using it, and thus zero benefit to making it free.
For example, consider the data you wrote to disk. The operating system could make the memory that was holding that data free. But that's a double loss. If the data you wrote to disk is later read, it will have to read it from disk rather than just grabbing it from memory. And if that memory is later needed for some other purpose, it will just have to undo all the work it went through making it free. Yuck. So if the system doesn't absolutely need free memory, it won't make it free.
My guess would be the problem is not in your program, but in the operating system. The OS keeps a cache of recently used files in memory on the assumption that you are going to access them again. It does not know with certainty what files are going to be needed, so it can end up deciding to keep the wrong ones at the expense of the ones you wish it was keeping.
It may be keeping the output files of the first run cached when you do your second run, which prevents it from effectively using the cache on the second run. You can test this theory by deleting all files from the first run (which should free them from cache) and seeing if that makes the second run go faster.
If that doesn't work, try deleting all the input files for the first run as well.
Answers
Yes there is no requirement in C or C++ to release memory that is not freed back to the OS
Do you have memory mapped files, open file handles for deleted files etc. Linux will not delete a file until all references to is a deallocated. Also linux will cache the file in memory in case it needs to be read again - file cache memory usage can be ignored as the OS will deal with it
No
Maybe valgrind will highlight cases where memory is not
Is there any way to retrieve the stack and heap usage in C on Linux?
I want to know the amount of memory taken specifically by stack/heap.
If you know the pid (e.g. 1234) of the process, you could use the pmap 1234 command, which print the memory map. You can also read the /proc/1234/maps file (actually, a textual pseudo-file because it does not exist on disk; its content is lazily synthesized by the kernel). Read proc(5) man page. It is Linux specific, but inspired by /proc file systems on other Unix systems.
(you'll better open, read, then close that pseudo-file quickly; don't keep a file descriptor on it open for many seconds; it is more a "pipe"-like thing, since you need to read it sequentially; it is a pseudo-file without actual disk I/O involved)
And from inside your program, you could read the /proc/self/maps file. Try the cat /proc/self/maps command in a terminal to see the virtual address space map of the process running that cat command, and cat /proc/$$/maps to see the map of your current shell.
All this give you the memory map of a process, and it contains the various memory segments used by it (notably space for stack, for heap, and for various dynamic libraries).
You can also use the getrusage system call.
Notice also that with multi-threading, each thread of a process has its own call stack.
You could also parse the /proc/$pid/statm or /proc/self/statm pseudo-file, or the /proc/$pid/status or /proc/self/status one.
But see also Linux Ate my RAM for some hints.
Consider using valgrind (at least on Linux) to debug memory leaks.
I have a read benchmark and between consecutive runs, I have to make sure that the data does not reside in memory to avoid effects seen due to caching. So far what I used to do is: run a program that writes a large file between consecutive runs of the read benchmark. Something like
./read_benchmark
./write --size 64G --path /tmp/test.out
./read_benchmark
The write program simply writes an array of size 1G 64 times to file. Since the size of the main memory is 64G, I write a file that is approx. the same size. The problem is that writing takes a long time and I was wondering if there are better ways to do this, i.e. avoid effects seen when data is cached.
Also, what happens if I write data to /dev/null?
./write --size 64G --path /dev/null
This way, the write program exits very fast, no I/O is actually performed, but I am not sure if it overwrites 64G of main memory, which is what I ultimately want.
Your input is greatly appreciated.
You can drop all caches using a special file in /proc like this:
echo 3 > /proc/sys/vm/drop_caches
That should make sure cache does not affect the benchmark.
You can just unmount the filesystem and mount it back. Unmounting flushes and drops the cache for the filesystem.
Use echo 3 > /proc/sys/vm/drop_caches to flush the pagecache, directory entries cache and inodes cache.
You can the fadvise calls with FADV_DONTNEED to tell the kernel to keep certain files from being cached. You can also use mincore() to verify that the file is not cached. While the drop_caches solution is clearly simpler, this might be better than wiping out the entire cache as that effects all processes on the box.. I don't think you need elevated privledges to use fadvise while I bet you do for writing to /proc. Here is a good example of how to use fadvise calls for this purpose: http://insights.oetiker.ch/linux/fadvise/
One (crude) way that almost never fails is to simply occupy all that excess memory with another program.
Make a trivial program that allocates nearly all the free memory (while leaving enough for your benchmark app). Then memset() the memory to something to ensure that the OS will commit it to physical memory. Finally, do a scanf() to halt the program without terminating it.
By "hogging" all the excess memory, the OS won't be able to use it as cache. And this works in both Linux and Windows. Now you can proceed to do your I/O benchmark.
(Though this might not go well if you're sharing the machine with other users...)