Obtaining Cpu And Memory Information Inside a System Call - c

I am trying to achieve CPU and Memory usage information of current process, inside a system call.
I can get current process name, pid and uid by using :
current->comm //process name
current->pid //process id
current_uid() //uid
but that seems to be all.(I am using kernel 3.2.0-24-generic)
As I have seen from Memory usage of current process in C, reading(vfs_read) and parsing /proc/pid/status seems to be the only option to get memory and cpu usage.
Is there a better way to obtain this information, or am I on the right track?
I also test my code as a kernel module first, since both system calls and kernel modules are running in kernel space. Is that also bad approach?

current->mm is the place where all memory information is stored.
current->mm->mmap is a list of memory mappings for the process, so you can iterate it and see what you find there.
current->utime and current->stime may be useful for getting CPU information.

Related

Modify read-only memory at low overhead

Assume that I have a page of memory that is read-only (e.g., set through mmap/mprotect). How do I modify one word (8 bytes) on this page at the lowest possible overhead?
Some context: I assume x86-64, Linux as my runtime environment. The modifications happen rarely but frequently enough so that I have to worry about overhead. The page is read only to protect some important data that must be read by the program frequently against rogue/illegal modifications. There are only few places that are allowed to modify the data on the page and I know all the locations of these places and the address of the page statically. The problem I'm trying to solve is protecting some data against memory safety bugs in the program with a few authorized places where I need to make modifications to the data. The modifications are not frequent but frequent enough so that several kernel-roundtrips (through system calls) are too costly.
So far, I thought of the following solutions:
mprotect
ptrace
shared memory
new system call
mprotect
mprotect(addr, 4096, PROT_WRITE | PROT_READ);
addr[12] = 0xc0fec0fe;
mprotect(addr, 4096, PROT_READ);
The mprotect solution is clean, simple, and straight-forward. Unfortunately, it involves two round trips into the kernel and will result in some overhead. In addition, the whole page will be writable during that time frame, allowing for some other thread to modify that memory area concurrently.
ptrace
Unfortunately, ptraceing yourself is no longer possible (as a ptraced-process needs to be stopped. So the solution is to fork, ptrace the child process, then use PTRACE_POKETEXT to write to the child processes memory.
This option has the drawback of spawning a parent process and will result in problems if the tracee uses multiple processes. The overhead per write is at least one system call for PTRACE plus the required synchronization between the processes.
shared memory
Shared memory is similar to the ptrace solution except that it reduces the system call. Both processes set up shared memory with different permissions (RW in the child, R in the parent). The two processes still need to synchronize on each write that is then carried out by the parent. Shared memory has similar drawbacks in complexity as the ptrace solution and incompatibilities with multiple communicating processes.
new system call
Adding a new system call to the kernel would solve the problem and would only require a single system call to modify one word in the process without having to change the page tables or the requirement to set up multiple communicating processes.
Is there anything that is faster than the 4 discussed/sketched solutions? Could I rely on any debug features? Are there any other neat low-level systems tricks?

Scanning memory in C/UNIX

I need to scan the entire memory of the calling process of my program and separate check which blocks are read-only, read-write, or inaccessible. It sounds pretty straight forward but I'm having trouble getting started. I'm wondering if anyone can point me in the right direction by providing relevant functions for scanning the memory of a calling process
For example, to start off, how would I obtain the starting and ending memory addresses of the calling process?
This might be kernel dependent, but on Linux the /proc file system can access it:
/proc/[pid]/mem is the contents of the memory by a process, so you just have to identify your parent's pid, and if you have access you can scan it.
The actual layout of the file will depend somewhat on the executable type and kernel in question.
http://linux.die.net/man/5/proc

How to measure the memory usage of a process without calling an external program

The memory usage of a process can be displayed by running:
$ ps -C processname -o size
SIZE
3808
Is there any way to retrieve this information without executing ps (or any external program), or reading /proc?
On a Linux system, a process' memory usage can be queried by reading /proc/[pid]/statm. Where [pid] is the PID of the process. If a process wants to query its own data, it can do so by reading /proc/self/statm instead. man 5 proc says:
/proc/[pid]/statm
Provides information about memory usage, measured in pages. The
columns are:
size total program size
(same as VmSize in /proc/[pid]/status)
resident resident set size
(same as VmRSS in /proc/[pid]/status)
share shared pages (from shared mappings)
text text (code)
lib library (unused in Linux 2.6)
data data + stack
dt dirty pages (unused in Linux 2.6)
You could just open the file with: fopen("/proc/self/statm", "r") and read the contents.
Since the file returns results in 'pages', you will want to find the page size also. getpagesize () returns the size of a page, in bytes.
You have a few options to do find the memory usage of a program:
Run it within a profiler like Valgrind or memprof.
exec/proc_open/fork a new process to use ps, top, or pmap as you would from the command line
bundle the ps into your app and use it directly (it's open source, of course!)
Use the /proc system (which is all that ps does, anyways...)
Create a report the kernel, which watches over process memory operations. The /proc filesystem is just a view into the kernel's internal data structures, so this is really already done for you.
Develop your own mechanism to compute memory usage without kernel assistance.
The former are all educational from a system administration perspective, and would be the best options in a real-life situation, but the last bullet point is probably the most interesting. You'd probably want to read the source of Valgrind or memprof to see how it works, but essentially what you'd need to do is insert your mechanism between the app and the kernel, and intercept any requests for memory allocation. Additionally, when the process started, you would want to initialize its memory space with a preset value like 0xDEADBEEF. Then, after the process finished, you could read through the memory space and count the occurrences of words other than your preset value, giving you an estimate of memory usage.
Of course, things are always more complicated than they seem. What about memory used by shared libraries? Pipes? Shared memory between your processes and another? System calls? Virtual memory allocated but not used? Data buffered to the disk? There's a lot of calls to be made beyond your question 'memory of process', see this post for some additional concerns.

Memory Optimization for child processes

I work on Linux for ARM processor for cable modem. There is a tool that I have written that sends/storms customized UDP packets using raw sockets. I form the packet from scratch so that we have the flexibility to play with different options. This tool is mainly for stress testing routers.
I actually have multiple interfaces created. Each interface will obtain IP addresses using DHCP. This is done in order to make the modem behave as virtual customer premises equipment (vcpe).
When the system comes up, I start those processes that are asked to. Every process that I start will continuously send packets. So process 0 will send packets using interface 0 and so on. Each of these processes that send packets would allow configuration (change in UDP parameters and other options at run time). Thats the reason I decide to have separate processes.
I start these processes using fork and excec from the provisioning processes of the modem.
The problem now is that each process takes up a lot of memory. Starting just 3 such processes, causes the system to crash and reboot.
I have tried the following:
I have always assumed that pushing more code to the Shared Libraries will help. So when I tried moving many functions into shared library and keeping minimum code in the processes, it made no difference to my surprise. I also removed all arrays and made them use the heap. However it made no difference. This maybe because the processes runs continuously and it makes no difference if it is stack or heap? I suspect the process from I where I call the fork is huge and that is the reason for the processes that I make result being huge. I am not sure how else I could go about. say process A is huge -> I start process B by forking and excec. B inherits A's memory area. So now I do this -> A starts C which inturn starts B will also not help as C still inherits A?. I used vfork as an alternative which did not help either. I do wonder why.
I would appreciate if someone give me tips to help me reduce the memory used by each independent child processes.
Given this is a test tool, then the most efficient thing to do is to add more memory to the testing machine.
Failing that:
How are you measuring memory usage? Some methods don't get accurate results.
Check you don't have any memory leaks. e.g. with Valgrind on Linux x86.
You could try running the different testers in a single process, as different threads, or even multiplexed in a single thread - since the network should be the limiting factor?
exec() will shrink the processes memory size as the new execution gets a fresh memory map.
If you can't add physical memory, then maybe you can add swap, maybe just for testing?
Not technically answering your question, but providing a couple of alternative solutions:
If you are using Linux have you considered using pktgen? It is a flexible tool for sending UDP packets from kernel as fast as the interface allows. This is much faster than a userspace tool.
oh and a shameless plug. I have made a multi-threaded network testing tool, which could be used to spam the network with UDP packets. It can operate in multi-process mode (by using fork), or multi-thread mode (by using pthreads). The pthreads might use less RAM, so might be better for you to use. If anything it might be worth looking at the source as I've spent many years improving this code, and its been able to generate enough packets to saturate a 10gbps interface.
What could be happening is that the fork call in process A requires a significant amount of RAM + swap (if any). Thus, when you call fork() from this process the kernel must reserve enough RAM and swap for the child process to have it's own copy (copy-on-write, actually) of the parent process's writable private memory, namely it's stack and heap. When you call exec() from the child process, that memory is no longer needed and your child process can have it's own, smaller private working set.
So, first thing to make sure is that you don't have more than one process at a time in the state between fork() and exec(). During this state is where the child process must have a duplicate of it's parent process virtual memory space.
Second, try using the overcommit settings which will allow the kernel to reserve more memory than actually exists. These are /proc/sys/vm/overcommit*. You can get away with using overcommit because your child processes only need the extra VM space until they call exec, and shouldn't actually touch the duplicated address space of the parent process.
Third, in your parent process you can allocate the largest blocks using shared memory, rather than the stack or heap, which are private. Thus, when you fork, those shared memory regions will be shared with the child process rather than duplicated copy-on-write.

Memory usage of a child process?

I'm running a sort of "sandbox" in C on Ubuntu: it takes a program, and runs it safely under the user nobody (and intercepts signals, etc). Also, it assigns memory and time limits, and measures time and memory usage.
(In case you're curious, it's for a sort of "online judge" to mark programs on test data)
Currently I've adapted the safeexec module from mooshak. Though most things work properly, the memory usage seems to be a problem. (It's highly inaccurate)
Now I've tried the advice here and parsed VM from /proc/pid/stat, and now the accuracy problem is fixed. However, for programs that finish really quickly it doesn't work and just gives back 0.
The safeexec program seems to work like this:
It fork()s
Uses execv() in the child process to run the desired program
Monitors the program from the parent process until the child process terminates (using wait4, which happens to return CPU usage - but not memory?)
So it parses /proc/../stat of the child process (which has been replaced by the execv)
So why is VM in /proc/child_pid/stat sometimes equal to 0?
Is it because the execv() finishes too quickly, and /proc/child_pid/stat just isn't available?
If so, is there some sort of other way to get the memory usage of the child?
(Since this is meant to judge programs under a time limit, I can't afford something with a performance penalty like valgrind)
Thanks in advance.
Can you arrange for the child process to use your own version of malloc() et al and have that log the HWM memory usage (perhaps using a handler registered with atexit())? Perhaps you'd use LD_PRELOAD to load your memory management library. This won't help with huge static arrays or huge automatic arrays.
Hmm, sounds interesting. Any way to track the static/automatic arrays, though?
Static memory can be analyzed with the 'size' command - more or less.
Automatic arrays are a problem - I'm not sure how you could handle those. Your memory allocation code could look at how much stack is in use when it is called (look at the address of a local variable). But there's no guarantee that the memory will be allocated when the maximum amount of local array is in use, so it gives at best a crude measure.
One other thought: perhaps you could use debugger technology - the ptrace() system call - to control the child process, and in particular, to hold it up for long enough to be able to collect the memory usage statistics from /proc/....
You could set the hard resource limit (setrlimit for RLIMIT_AS resource) before execve(). The program will not be able to allocate more than that amount of memory. If it tries to do so, memory allocation calls (brk, mmap, mremap) will fail. If the program does not handle the out-of-memory condition, it will segfault, which will be reflected in the exit status returned by wait4.
You can use getrusage(2) function from sys/resources.h library.
Link: https://linux.die.net/man/2/getrusage
This functions uses "rusage" structure that contains ru_maxrss field which stores information about the largest child memory usage from all the children current process had.
This function can be also executed from main process after all the child processes were terminated.
To get information try something like this:
struct rusage usage;
int a = getrusage(RUSAGE_CHILDREN, &usage);
But there is a little trick.
If You want to have information about every child processes memory usage (not only the biggest one) You must fork() your program twice (first fork allows You to have independent process and the second one will be the process You'd like to test.

Resources