How to measure the memory usage of a process without calling an external program

How to measure the memory usage of a process without calling an external program - c

The memory usage of a process can be displayed by running:
$ ps -C processname -o size
SIZE
3808
Is there any way to retrieve this information without executing ps (or any external program), or reading /proc?

On a Linux system, a process' memory usage can be queried by reading /proc/[pid]/statm. Where [pid] is the PID of the process. If a process wants to query its own data, it can do so by reading /proc/self/statm instead. man 5 proc says:
/proc/[pid]/statm
Provides information about memory usage, measured in pages. The
columns are:
size total program size
(same as VmSize in /proc/[pid]/status)
resident resident set size
(same as VmRSS in /proc/[pid]/status)
share shared pages (from shared mappings)
text text (code)
lib library (unused in Linux 2.6)
data data + stack
dt dirty pages (unused in Linux 2.6)
You could just open the file with: fopen("/proc/self/statm", "r") and read the contents.
Since the file returns results in 'pages', you will want to find the page size also. getpagesize () returns the size of a page, in bytes.

You have a few options to do find the memory usage of a program:
Run it within a profiler like Valgrind or memprof.
exec/proc_open/fork a new process to use ps, top, or pmap as you would from the command line
bundle the ps into your app and use it directly (it's open source, of course!)
Use the /proc system (which is all that ps does, anyways...)
Create a report the kernel, which watches over process memory operations. The /proc filesystem is just a view into the kernel's internal data structures, so this is really already done for you.
Develop your own mechanism to compute memory usage without kernel assistance.
The former are all educational from a system administration perspective, and would be the best options in a real-life situation, but the last bullet point is probably the most interesting. You'd probably want to read the source of Valgrind or memprof to see how it works, but essentially what you'd need to do is insert your mechanism between the app and the kernel, and intercept any requests for memory allocation. Additionally, when the process started, you would want to initialize its memory space with a preset value like 0xDEADBEEF. Then, after the process finished, you could read through the memory space and count the occurrences of words other than your preset value, giving you an estimate of memory usage.
Of course, things are always more complicated than they seem. What about memory used by shared libraries? Pipes? Shared memory between your processes and another? System calls? Virtual memory allocated but not used? Data buffered to the disk? There's a lot of calls to be made beyond your question 'memory of process', see this post for some additional concerns.

Related

windows - plain shared memory between 2 processes (no file mapping, no pipe, no other extra)

How to have an isolated part of memory, that is NOT at all backed to any file or extra management layers such as piping and that can be shared between two dedicated processes on the same Windows machine?
Majority of articles point me into the direction of CreateFileMapping. Let's start from there:
How does CreateFileMapping with hFile=INVALID_HANDLE_VALUE actually work?
According to
https://msdn.microsoft.com/en-us/library/windows/desktop/aa366537(v=vs.85).aspx
it
"...creates a file mapping object of a specified size that is backed by the system paging file instead of by a file in the file system..."
Assume I write something into the memory which is mapped by CreateFileMapping with hFile=INVALID_HANDLE_VALUE. Under which conditions will this content be written to the page file on disk?
Also my understanding of what motivates the use of shared memory is to keep performance up and optimized. Why is the article "Creating Named Shared Memory"
(https://msdn.microsoft.com/de-de/library/windows/desktop/aa366551(v=vs.85).aspx) refering to
CreateFileMapping, if there is not a single attribute combination, that would prevent writing to files, e.g. the page file?
Going back to the original question: I am afraid, that CreateFileMapping is not good enough... So what would work?

You misunderstand what it means for memory to be "backed" by the system paging file. (Don't feel bad; Raymond Chen has described the text you quoted from MSDN as "one of the most misunderstood sentences in the Win32 documentation.") Almost all of the computer's memory is "backed" by something on disk; only the "non-paged pool", used exclusively by the kernel and as little as possible, isn't. If a page isn't backed by an ordinary named file, then it's backed by the system paging file. The operating system won't write pages out to the system paging file unless it needs to, but it can if it does need to.
This architecture is intended to ensure that processes can be completely "paged out" of RAM when they have nothing to do. This used to be much more important than it is nowadays, but it's still valuable; a typical Windows desktop will have dozens of processes "idle" waiting for events (e.g. needing to spool a print job) that may never happen. Those processes can get paged out and the memory can be put to more constructive use.
CreateFileMapping with hfile=INVALID_HANDLE_VALUE is, in fact, what you want. As long as the processes sharing the memory are actively doing stuff with it, it will remain resident in RAM and there will be no performance problem. If they go idle, yeah, it may get paged out, but that's fine because they're not doing anything with it.
You can direct the system not to page out a chunk of memory; that's what VirtualLock is for. But it's meant to be used for small chunks of memory containing secret information, where writing it to the page file could conceivably leak the secret. The MSDN page warns you that "Each version of Windows has a limit on the maximum number of pages a process can lock. This limit is intentionally small to avoid severe performance degradation."

When a binary file runs, does it copy its entire binary data into memory at once? Could I change that?

Does it copy the entire binary to the memory before it executes? I am interested in this question and want to change it into some other way. I mean, if the binary is 100M big (seems impossible), I could run it while I am copying it into the memory. Could that be possible?
Or could you tell me how to see the way it runs? Which tools do I need?

The theoretical model for an application-level programmer makes it appear that this is so. In point of fact, the normal startup process (at least in Linux 1.x, I believe 2.x and 3.x are optimized but similar) is:
The kernel creates a process context (more-or-less, virtual machine)
Into that process context, it defines a virtual memory mapping that maps
from RAM addresses to the start of your executable file
Assuming that you're dynamically linked (the default/usual), the ld.so program
(e.g. /lib/ld-linux.so.2) defined in your program's headers sets up memory mapping for shared libraries
The kernel does a jmp into the startup routine of your program (for a C program, that's
something like crtprec80, which calls main). Since it has only set up the mapping, and not actually loaded any pages(*), this causes a Page Fault from the CPU's Memory Management Unit, which is an interrupt (exception, signal) to the kernel.
The kernel's Page Fault handler loads some section of your program, including the part
that caused the page fault, into RAM.
As your program runs, if it accesses a virtual address that doesn't have RAM backing
it up right now, Page Faults will occur and cause the kernel to suspend the program
briefly, load the page from disc, and then return control to the program. This all
happens "between instructions" and is normally undetectable.
As you use malloc/new, the kernel creates read-write pages of RAM (without disc backing files) and adds them to your virtual address space.
If you throw a Page Fault by trying to access a memory location that isn't set up in the virtual memory mappings, you get a Segmentation Violation Signal (SIGSEGV), which is normally fatal.
As the system runs out of physical RAM, pages of RAM get removed; if they are read-only copies of something already on disc (like an executable, or a shared object file), they just get de-allocated and are reloaded from their source; if they're read-write (like memory you "created" using malloc), they get written out to the ( page file = swap file = swap partition = on-disc virtual memory ). Accessing these "freed" pages causes another Page Fault, and they're re-loaded.
Generally, though, until your process is bigger than available RAM — and data is almost always significantly larger than the executable — you can safely pretend that you're alone in the world and none of this demand paging stuff is happening.
So: effectively, the kernel already is running your program while it's being loaded (and might never even load some pages, if you never jump into that code / refer to that data).
If your startup is particularly sluggish, you could look at the prelink system to optimize shared library loads. This reduces the amount of work that ld.so has to do at startup (between the exec of your program and main getting called, as well as when you first call library routines).
Sometimes, linking statically can improve performance of a program, but at a major expense of RAM — since your libraries aren't shared, you're duplicating "your libc" in addition to the shared libc that every other program is using, for example. That's generally only useful in embedded systems where your program is running more-or-less alone on the machine.
(*) In point of fact, the kernel is a bit smarter, and will generally preload some pages
to reduce the number of page faults, but the theory is the same, regardless of the
optimizations

No, it only loads the necessary pages into memory. This is demand paging.
I don't know of a tool which can really show that in real time, but you can have a look at /proc/xxx/maps, where xxx is the PID of your process.

While you ask a valid question, I don't think it's something you need to worry about. First off, a binary of 100M is not impossible. Second, the system loader will load the pages it needs from the ELF (Executable and Linkable Format) into memory, and perform various relocations, etc. that will make it work, if necessary. It will also load all of its requisite shared library dependencies in the same way. However, this is not an incredibly time-consuming process, and one that doesn't really need to be optimized. Arguably, any "optimization" would have a significant overhead to make sure it's not trying to use something that hasn't been loaded in its due course, and would possibly be less efficient.
If you're curious what gets mapped, as fge says, you can check /proc/pid/maps. If you'd like to see how a program loads, you can try running a program with strace, like:
strace ls
It's quite verbose, but it should give you some idea of the mmap() calls, etc.

How to check stack and heap usage with C on Linux?

Is there any way to retrieve the stack and heap usage in C on Linux?
I want to know the amount of memory taken specifically by stack/heap.

If you know the pid (e.g. 1234) of the process, you could use the pmap 1234 command, which print the memory map. You can also read the /proc/1234/maps file (actually, a textual pseudo-file because it does not exist on disk; its content is lazily synthesized by the kernel). Read proc(5) man page. It is Linux specific, but inspired by /proc file systems on other Unix systems.
(you'll better open, read, then close that pseudo-file quickly; don't keep a file descriptor on it open for many seconds; it is more a "pipe"-like thing, since you need to read it sequentially; it is a pseudo-file without actual disk I/O involved)
And from inside your program, you could read the /proc/self/maps file. Try the cat /proc/self/maps command in a terminal to see the virtual address space map of the process running that cat command, and cat /proc/$$/maps to see the map of your current shell.
All this give you the memory map of a process, and it contains the various memory segments used by it (notably space for stack, for heap, and for various dynamic libraries).
You can also use the getrusage system call.
Notice also that with multi-threading, each thread of a process has its own call stack.
You could also parse the /proc/$pid/statm or /proc/self/statm pseudo-file, or the /proc/$pid/status or /proc/self/status one.
But see also Linux Ate my RAM for some hints.
Consider using valgrind (at least on Linux) to debug memory leaks.

mmaping large files(for persistent large arrays)

I'm implementing persistent large constant arrays via mmap. Is there any tips and tricks or gotchas one should be aware when using mmap?

All pointers that are stored inside the mmap'd region should be done as offsets from the base of the mmap'd region, not as real pointers! You won't necessarily be getting the same base address when you mmap the region on the next run of the program. (I have had to clean up code that made incorrect assumptions about mmap region base address constancy).

This is the most straight forward use case for mmap() so there shouldn't be much to trip you up.
You are effectively just loading a large constant array. Being constants you shouldn't need to worry about synchronization. It would be advisable to make sure the prot parameter is set to PROT_READ only since you won't be writing.
If one or more programs using the constants are going to be continually run, it might be worthwhile to have a separate program that loads the data and keeps it resident. Runs of the other programs then essentially are just doing an shared memory attach rather than continually reading the file into memory.

Make sure you check for restrictions on open file size or memory usage. On Linux there is a built in shell command ulimit. Run as ulimit -a to see the current settings.
Flush writes to the in-memory array to the file with the msync(2) syscall or else they may stay in memory until munmap(2) and there may be a power outage or something before then!
If multiple processes are mmap'ing the same memory region shared with read and write privileges, make sure that only one is writing to it at a time to avoid corrupting your data. Or use file locking or some other means of synchronization.

C/C++ memory usage API in Linux/Windows

I'd like to obtain memory usage information for both per process and system wide. In Windows, it's pretty easy. GetProcessMemoryInfo and GlobalMemoryStatusEx do these jobs greatly and very easily. For example, GetProcessMemoryInfo gives "PeakWorkingSetSize" of the given process. GlobalMemoryStatusEx returns system wide available memory.
However, I need to do it on Linux. I'm trying to find Linux system APIs that are equivalent GetProcessMemoryInfo and GlobalMemoryStatusEx.
I found 'getrusage'. However, max 'ru_maxrss' (resident set size) in struct rusage is just zero, which is not implemented. Also, I have no idea to get system-wide free memory.
Current workaround for it, I'm using "system("ps -p %my_pid -o vsz,rsz");". Manually logging to the file. But, it's dirty and not convenient to process the data.
I'd like to know some fancy Linux APIs for this purpose.

You can see how it is done in libstatgrab.
And you can also use it (GPL)

Linux has a (modular) filesystem-interface for fetching such data from the kernel, thus being usable by nearly any language or scripting tool.
Memory can be complex. There's the program executable itself, presumably mmap()'ed in. Shared libraries. Stack utilization. Heap utilization. Portions of the software resident in RAM. Portions swapped out. Etc.
What exactly is "PeakWorkingSetSize"? It sounds like the maximum resident set size (the maximum non-swapped physical-memory RAM used by the process).
Though it could also be the total virtual memory footprint of the entire process (sum of the in-RAM and SWAPPED-out parts).
Irregardless, under Linux, you can strace a process to see its kernel-level interactions. "ps" gets its data from /proc/${PID}/* files.
I suggest you cat /proc/${PID}/status. The Vm* lines are quite useful.
Specifically: VmData refers to process heap utilization. VmStk refers to process stack utilization.
If you continue using "ps", you might consider popen().
I have no idea to get system-wide free memory.
There's always /usr/bin/free
Note that Linux will make use of unused memory for buffering files and caching... Thus the +/-buffers/cache line.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight