I've got an application that makes use of System V shared memory segments. Normally it manages these internally and no one needs to touch them. But for emergencies we've got a utility that manually clears the shared memory segments.
The problem is that to do it, it runs ipcs, and grabs chunks of the output using cut. That seems pretty fragile. It already runs slightly different commands on different platforms to reflect the fact that the ipcs output is formatted differently on Linux / AIX / Solaris etc.
Is there a better way to find shared memory segments than parsing ipcs output?
You could reimplement your own version of ipcs that gives the same output regardless of the OS. This requires some system-level programming though.
On Linux ipcs uses shmctl(0, SHM_INFO, ...) to find out the index of the highest used shared memory segment, then runs shmctl(index, SHM_STAT, ...) in a loop over all indexes from 0 to the highest index in order to obtain information about each segment. This should also work on FreeBSD (not documented but apparent from the kernel source), although on that OS ipcs uses sysctl to read the values of kern.ipc.shm*.
On Solaris ipcs uses shmids(NULL, 0, &nids) to obtain the number of segments IDs, then calls shmids(&ids, nids, ...) to obtain the list of actual IDs and then uses shmctl(id, IPC_STAT, ...) to obtain information on each segment.
ipcs is a fairly old instrument and one would not expect its output to change much in the future, at least not until POSIX shared memory completely displaces SysV IPC.
Related
I was reading a paragraph from the "The Linux Kernel Module Programming Guide" and I have a couple of doubts related to the following paragraph.
The reason for copy_from_user or get_user is that Linux memory (on
Intel architecture, it may be different under some other processors)
is segmented. This means that a pointer, by itself, does not reference
a unique location in memory, only a location in a memory segment, and
you need to know which memory segment it is to be able to use it.
There is one memory segment for the kernel, and one for each of the
processes.
However it is my understanding that Linux uses paging instead of segmentation and that virtual addresses at and above 0xc0000000 have the kernel mapping in.
Do we use copy_from_user in order to accommodate older kernels?
Do the current linux kernels use segmentation in any way at all? If so how?
If (1) is not true, are there any other advantages to using copy_from_user?
Yeah. I don't like that explanation either. The details are essentially correct in a technical sense (see also Why does Linux on x86 use different segments for user processes and the kernel?) but as you say, linux typically maps the memory so that kernel code could access it directly, so I don't think it's a good explanation for why copy_from_user, etc. actually exist.
IMO, the primary reason for using copy_from_user / copy_to_user (and friends) is simply that there are a number of things to be checked (dangers to be guarded against), and it makes sense to put all of those checks in one place. You wouldn't want every place that needs to copy data in and out from user-space to have to re-implement all those checks. Especially when the details may vary from one architecture to the next.
For example, it's possible that a user-space page is actually not present when you need to copy to or from that memory and hence it's important that the call be made from a context that can accommodate a page fault (and hence being put to sleep).
Also, user-space data pointers need to be checked carefully to ensure that they actually point to user-space and that they point to data regions, and that the copy length doesn't wrap beyond the end of the valid regions, and so forth.
Finally, it's possible that user-space actually doesn't share the same page mappings with the kernel. There used to be a linux patch for 32-bit x86 that made the complete 4G of virtual address space available to user-space processes. In that case, kernel code could not make the assumption that a user-space pointer was directly accessible, and those functions might need to map individual user-space pages one at a time in order to access them. (See 4GB/4GB Kernel VM Split)
I have this a big server software that can hog 4-8GB of memory.
This makes fork-exec cumbersome, as the fork itself can take significant time, plus the default behavior seems to be that fork will fail unless there is enough memory for a copy of the entire resident memory.
Since this is starting to show as the hottest spot (60% of time spent in fork) when profiling I need to address it.
What would be the easiest way to avoid fork-exec routine?
You basically cannot avoid fork(2) (or the equivalent clone(2) syscall..., or the obsolete vfork which I don't recommend using) + execve(2) to start an external command (à la system(3), or à la posix_spawn) on Linux and (probably) MacOSX and most other Unix or POSIX systems.
What makes you think that it is becoming an issue? And 8GB process virtual address space is not a big deal today (at least on machines with 8Gbytes, or 16Gbytes RAM, like my desktop has). You don't practically need twice as much RAM (but you do need swap space) thanks to the lazy copy-on-write techniques used by all recent Unixes & Linux.
Perhaps you might believe that swap space could be an issue. On Linux, you could add swap space, perhaps by swapping on a file; just run as root:
dd if=/dev/zero of=/var/tmp/myswap bs=1M count=32768
mkswap /var/tmp/myswap
swapon /var/tmp/myswap
(of course, be sure that /var/tmp/ is not a tmpfs mounted filesystem, but sits on some disk, perhaps an SSD one....)
When you don't need any more a lot of swap space, run swapoff /var/tmp/myswap....
You could also start some external shell process near the beginning of your program (à la popen) and later you might send shell commands to it. Look at my execicar.c program for inspiration, or use it if it fits (I wrote it 10 years ago for similar purposes, but I forgot the details)
Alternatively fork at the beginning of your program some interpreter (Lua, Guile...) and send some commands to it.
Running more than a few dozens commands per second (starting any external program) is not reasonable, and should be considered as a design mistake, IMHO. Perhaps the commands that you are running could be replaced by in-process functions (e.g. /bin/ls can be done with stat, readdir, glob functions ...). Perhaps you might consider adding some plugin ability (with dlopen(3) & dlsym) to your code (and run functions from plugins instead of starting very often the same programs). Or perhaps embed an interpreter (Lua, Guile, ...) inside your code.
As an example, for web servers, look for old CGI vs FastCGI or HTTP forwarding (e.g. URL redirection) or embedded PHP or HOP or Ocsigen
This makes fork-exec cumbersome, as the fork itself can take
significant time
This is only half true. You didn't specify the OS, but fork(2) is pretty optimized in Linux (and I believe in other UNIX variants) by using copy-on-write. Copy-on-write means that the operating system will not copy the entire parent memory address space until the child (or the parent) writes to memory. So you can rest assured that if you have a parent process using 8 GB of memory and then you fork, you won't be using 16 GB of memory - especially if the child execs() something immediately.
fork will fail unless there is enough memory for a copy of the entire
resident memory.
No. The only overhead incurred by fork(2) is the copy and allocation of a task structure for the child, the allocation of a PID, and copying the parent's page tables. fork(2) will not fail if there isn't enough memory to copy the entire parent's address space, it will fail if there isn't enough memory to allocate a new task structure and the page tables. It may also fail if the maximum number of processes for the user has been reached. You can confirm this in man 2 fork (NOTE: See comments below).
If you still don't want to use fork(2), you can use vfork(2), which does no copying at all - it doesn't even copy the page tables - everything is shared with the parent. You can use that to create a new child process with a negligible overhead and then exec() something. Be aware that vfork(2) blocks the calling thread until the child either exits or calls one of the seven exec() functions. You also shouldn't modify the memory inside the child process before calling any of the exec() functions.
You mentioned that you can fork+exec 10k times per second. That sounds very excessive. Have you considered making the things you're execing into a daemon? Or maybe implement those external programs inside your application? It sounds very dodgy to have to fork that much.
fork most likely starts failing for you despite having the memory to back it because you're on a flavor of linux that has disabled (or put a limit on) memory overcommit. Check the file /proc/sys/vm/overcommit_memory. If it's 1 then my guess is wrong and there's something else weird going on. If it's 0 then you're not allowed to overcommit at all. If it's 2 then you need to read the documentation for how exactly this gets configured.
One solution mentioned above is just adding swap (that will never get used).
Another solution is to implement a small daemon that will take commands and execute those forks and execs for you piping back whatever output you need.
N.B. fork of a large process can in theory be as fast as a small process. The performance of fork is determined by how many memory mappings you have rather than how much memory they cover. Setting up copy-on-write is done per mapping. Except that on certain operating systems setting up COW of anonymous mappings is linear to amount of memory in those mappings, but I don't know what Linux does here, last time I studied the VM system in Linux was over 15 years ago.
Is it possible to launch two completely independent programs into one scope of memory area?
For example, I have skype.exe and opera.exe and I want to launch them on a way that will allows them to share common memory. Sounds like threading to me.
These are quite some questions at the same time, let me try to dissect:
It is the definition of a process on a modern OS to have its own virtual address space. So running two processes in the same address space can't happen without a modification to the OS to allow exactly that.
Even if such a modification were available, it would be a less than perfect idea: Access to memory shared between threads is governed by synchronisation primitives explicitly built into them. There is no such mechanism to manage memory access between two processes, that have not explicitly been designed so
Sharing memory if so designed between processes does not at all need them to run in the same virtual address space in their totality: Shared memory segments exist in virtually all modern OS to facilitate exactly that. Again, those processes have to be explicitly designed to use this feature.
If they are two independent programs running then you have to ensure that the data is passed in an independent way between them. Let's say the two programs are running, the first program compute some data that the second program needs. The simplest thing to do is print the data from the first program into a file with some status at the end of the file (to indicate that it is safe for the other program to start reading it). From the other program you have a while loop that checks the status of the last line in that file every period of time.
The other option is to use some library like MPI which has protocols for message passing implemented.
The memory usage of a process can be displayed by running:
$ ps -C processname -o size
SIZE
3808
Is there any way to retrieve this information without executing ps (or any external program), or reading /proc?
On a Linux system, a process' memory usage can be queried by reading /proc/[pid]/statm. Where [pid] is the PID of the process. If a process wants to query its own data, it can do so by reading /proc/self/statm instead. man 5 proc says:
/proc/[pid]/statm
Provides information about memory usage, measured in pages. The
columns are:
size total program size
(same as VmSize in /proc/[pid]/status)
resident resident set size
(same as VmRSS in /proc/[pid]/status)
share shared pages (from shared mappings)
text text (code)
lib library (unused in Linux 2.6)
data data + stack
dt dirty pages (unused in Linux 2.6)
You could just open the file with: fopen("/proc/self/statm", "r") and read the contents.
Since the file returns results in 'pages', you will want to find the page size also. getpagesize () returns the size of a page, in bytes.
You have a few options to do find the memory usage of a program:
Run it within a profiler like Valgrind or memprof.
exec/proc_open/fork a new process to use ps, top, or pmap as you would from the command line
bundle the ps into your app and use it directly (it's open source, of course!)
Use the /proc system (which is all that ps does, anyways...)
Create a report the kernel, which watches over process memory operations. The /proc filesystem is just a view into the kernel's internal data structures, so this is really already done for you.
Develop your own mechanism to compute memory usage without kernel assistance.
The former are all educational from a system administration perspective, and would be the best options in a real-life situation, but the last bullet point is probably the most interesting. You'd probably want to read the source of Valgrind or memprof to see how it works, but essentially what you'd need to do is insert your mechanism between the app and the kernel, and intercept any requests for memory allocation. Additionally, when the process started, you would want to initialize its memory space with a preset value like 0xDEADBEEF. Then, after the process finished, you could read through the memory space and count the occurrences of words other than your preset value, giving you an estimate of memory usage.
Of course, things are always more complicated than they seem. What about memory used by shared libraries? Pipes? Shared memory between your processes and another? System calls? Virtual memory allocated but not used? Data buffered to the disk? There's a lot of calls to be made beyond your question 'memory of process', see this post for some additional concerns.
I'd like to obtain memory usage information for both per process and system wide. In Windows, it's pretty easy. GetProcessMemoryInfo and GlobalMemoryStatusEx do these jobs greatly and very easily. For example, GetProcessMemoryInfo gives "PeakWorkingSetSize" of the given process. GlobalMemoryStatusEx returns system wide available memory.
However, I need to do it on Linux. I'm trying to find Linux system APIs that are equivalent GetProcessMemoryInfo and GlobalMemoryStatusEx.
I found 'getrusage'. However, max 'ru_maxrss' (resident set size) in struct rusage is just zero, which is not implemented. Also, I have no idea to get system-wide free memory.
Current workaround for it, I'm using "system("ps -p %my_pid -o vsz,rsz");". Manually logging to the file. But, it's dirty and not convenient to process the data.
I'd like to know some fancy Linux APIs for this purpose.
You can see how it is done in libstatgrab.
And you can also use it (GPL)
Linux has a (modular) filesystem-interface for fetching such data from the kernel, thus being usable by nearly any language or scripting tool.
Memory can be complex. There's the program executable itself, presumably mmap()'ed in. Shared libraries. Stack utilization. Heap utilization. Portions of the software resident in RAM. Portions swapped out. Etc.
What exactly is "PeakWorkingSetSize"? It sounds like the maximum resident set size (the maximum non-swapped physical-memory RAM used by the process).
Though it could also be the total virtual memory footprint of the entire process (sum of the in-RAM and SWAPPED-out parts).
Irregardless, under Linux, you can strace a process to see its kernel-level interactions. "ps" gets its data from /proc/${PID}/* files.
I suggest you cat /proc/${PID}/status. The Vm* lines are quite useful.
Specifically: VmData refers to process heap utilization. VmStk refers to process stack utilization.
If you continue using "ps", you might consider popen().
I have no idea to get system-wide free memory.
There's always /usr/bin/free
Note that Linux will make use of unused memory for buffering files and caching... Thus the +/-buffers/cache line.