How to find number of children of a process in C? - c

I am doing quite a lot of forking in a process (and the children of that process are further forking), and I want to keep an acceptable limit on the total number of processes I create.
Is there a (preferably efficient) way of finding the total number of children of a process (including children of children, children of children of children, etc.) from C?
I would like my code to work on both linux and mac, so no /proc I'm afraid!

There is no way to enumerate all the children of a process, except by enumerating all the processes of the system and checking their PPID. (Of course, from the parent itself, you can just keep track of what you fork.) There is no way at all to enumerate all the descendants of a process: if P forks Q forks R then Q dies, there is no more information to relate P with R.
The portable way to obtain information about processes is to call the ps utility and parse its output.
If you want to limit the number of descendants of a process, you can do it easily by using a dedicated user to run that process, and starting the ancestor with the desired limit on processes per user (setrlimit(RLIMIT_NRPROC, …)).
You can also use a shared resource of some kind; this will work as long as the descendant processes don't close that resource. For example, you can open a file (without the O_CLOEXEC flag), if the descendants don't call fcntl with the FD_CLOEXEC flag on that file nor just go and close it. I think that on OSX you'll need to fork fuser or lsof (either will work on Linux too) to find out how many processes have the file open, I don't know of a way to do that without forking on OSX. You might investigate other mechanisms such as shared memory (shm_open and friends) or memory mappings (mmap and friends), but for these I don't know of a way to get the use count without forking either.

There is no portable API to do, what you're asking for. C itself doesn't even define the concept of processes and the process management APIs of an operating system are very specific and usually not portable.
Either you find a portable abstraction library for what you want to do, or you implement it yourself.

check this. if you can create a variable shared between all processes then you can moniter the number of processes based on that shared counter value.
even this answer can help you in creating a shared variable.

You could open up a pipe or socket in the root process, and have each child write to it when they're created and when they exit. If you want to limit the total number of descendant processes, you could have children check with the root process before they fork, rather than notifying it after.

Related

Get cpu of child process

I am writing a program that launches several child processes through fork() and needs to periodically track which CPU they are on. Is there any way to accomplish this in C?
I am aware of cpu_getaffinity(), but that's within the process itself. I would like to be able to call a function that would let me know what CPU a child process is running on based on the PID, and I haven't been able to find anything quite related to that.
The closest I've found is to access the /proc/ filesystem, but is there a way to do it within the program and not looking through an external system?
check it in parent process through /proc interface
check it in child processes, and send it back to parent process through IPC, such as shared memory or socket, etc..

Architecture for multi-processing application in C: fork or fork + exec

My question is about more philosophical than technical issues.
Objective is to write a multiprocess (not multithread) program with one "master" process and N "worker" processes. Program is linux-only, async, event-based web-server, like nginx. So, the main problem is how to spawn "worker" processes.
In linux world there are two ways:
1). fork()
2). fork() + exec*() family
A short description for each way and what confused me in each of them.
First way with fork() is dirty, because forked process has copy (...on-write, i know) of parent memory: signal handlers, variables, file\socket descriptors, environ and other, e.g. stack and heap. In conclusion, after fork i need to...hmm..."clear memory", for example, disable signal handlers, socket connections and other horrible things, inherited from parent, because child has a lot of data that he was not intended - breaks encapsulation, and many side-effects is possible.
The general way for this case is run infinite loop in forked process to handle some data and do some magic with socket pair, pipes or shared memory for creating communication channel between parent and child before and after fork(), because socket descriptors reopen in child and used same socket as parent.
Also, this is nginx-way: it has one executable binary, that use fork() for spawn child process.
The second way is similar to first, but have a difference with usage one of exec*() function in child process after fork() for run external binary. One important thing is that exec*() loads binary in current (forked) process memory, automatic clear stack, heap and do all other nasty job, so fork will look like a clearly new instance of program without copy of parent memory or something other trash.
There has another problem with communication establishing between parent and child: because forked process after exec*() remove all data inherited from parent, that i need somehow create a socket pair between parent and child. For example, create additional listen socket (domain or in another port) in parent and wait child connections and child should connect to parent after initialization.
The first way is simple, but confuse me, that is not a clear process, just a copy of parent memory, with many possible side-effects and trash, and need to keep in mind that forked process has many dependencies to parent code. Second way needs more time to support two binary, and not so elegant like single-file solution. Maybe, the best way is use fork() for process create and something to clear it memory without exec*() call, but I cant find any solution for second step.
In conclusion, I need help to decide which way to use: create one-file executable file like nginx, and use fork(), or create two separate files, one with "server" and one with "worker", and use fork() + exec*(worker) N times from "server", and want know for pros and cons for each way, maybe I missed something.
For a multiprocess solution both options, fork and fork+exec, are almost equivalent and depends on the child and parent process context. If the child process executes the parents' text (binary) and needs all or a part of parents' staff (descriptors, signals etc) - it is a sign to use fork. If the child should execute a new binary and needs nothing from the parents' staff - it seems fork+exec much more suitable.
There is also a good function in the pthread library - pthread_atfork().
It allows to register handlers that will be called before and after fork.
These handlers may perform all the necessary work (closing file descriptors, for example).
As a Linux Programmer, you have a rich library of multithreading process capabilities. Look at pthread and friends.
If you need a process per request, then fork and friends have been the most widely used since time immemorial.

How to keep track of all descendant processes to cleanup?

I have a program that can fork() and exec() multiple processes in a chain.
E.g.: process A --> fork, exec B --> fork, exec C --> fork, exec D. So A is the great-great-grandparent of C.
Now the problem is that I do not have any control of processes B, C and D. So, several things can happen.
It might so happen that a descendant process can do setsid() to change its process group and session.
Or one of the descendant process dies (say C) and hence its child (D) is parented by init.
Therefore, I can't rely on process group id or parent id to track all descendants of A. Is there any reliable way of keeping track of all descendants? More specifically, I would like to kill all the descendants (orphans and otherwise).
It would be also great if its POSIX compliant.
The POSIX way to do this is simply to use process groups. Descendant processes that explicitly change their process group / session are making a deliberate decision not to have their lifetime tracked by their original parent - they are specifically emancipating themselves from the parent's control. Such processes are not orphans - they are adults that have "flown the nest" and wish to exert control over their own lifetime.
I agree with caf's general sentiment: if a process calls setsid, it's saying it wants to live on its own, no matter what . You need to think carefully as to whether you really want to kill them.
That being said, sometimes, you will want some form of “super-session” to contain a tree of processes. There is no tool that provides such super-sessions in the POSIX toolbox, but I'm going to propose a few solutions. Each solution has its own limitations, so it's likely that they won't all be applicable to your case, but hopefully one of them will be suitable.
A clean solution is to run the processes in their own virtualized environment. This could be a FreeBSD-style jail, Linux cgroups, or any other kind of virtualization technology. The limitations of this approach are that virtualization technologies are OS-dependant, and the processes will run in a somewhat different context.
If you only have a single instance of these processes on the system and you can get root involved, run the processes as a dedicated user. The super-session is defined as the processes running as the dedicated user. Kill the descendants with kill(-1, signum) (note that this will kill the killer process itself unless it's blocked or handled the signal).
You can make the process open a unique file, making sure that the FD_CLOEXEC flag is set on the file descriptor. All child processes will then inherit the open file unless they explicitly remove the FD_CLOEXEC flag before calling execve or close the file. Kill the processes with fuser -k or by obtaining the list of process IDs with fuser or lsof (fuser is in POSIX, but not fuser -k.) Note that there's a race condition: a process may fork between the time you call fuser and the time you kill it; therefore you need to call fuser in a loop until no more processes appear (don't loop until all processes are dead, as this could be an infinite loop if one of the processes is blocking your signal).
You can generate a unique random string and define an environment variable with that name, or with a well-known name and that unique string as a value. It will be inherited by all descendant processes unless they choose to change their environment. There is no portable way to search for processes based on their environment, or even to obtain the environment of another process. On many unix variants, you can obtain the information with an option to ps (such as ps -e on *BSD or ps e on Linux); the information may not be easy to parse, but the presence of the unique string is a sufficient indicator. As with fuser above, note the need for a loop to avoid a race condition if a descendant calls fork too late for you to notice its child but before you could kill the parent.
You can LD_PRELOAD a small library that forks a thread that listens on a communication channel, and kills its process when notified. This may disrupt the process if it expects to know about all of its own threads; it's only a possibility on architectures where the standard library is always thread-safe, and you'll miss statically linked processes. The communication channel can be anything that allows the master process to broadcast the suicide order; one possibility is a pipe where each descendant process does a blocking read and the ancestor process closes the pipe to notify the descendants. Pass the file descriptor number through an environment variable.

Usage of Mutex across processes

OS: Windows Language: C/C++
The design demands to use a mutex variable across process and its sub processes.
If I create mutex in one process, I have to open the mutex in another processs to check the critical section's availablity.
To open the mutex, I need to know the name of the mutex created in parent process. Suppose, If I keep the mutex as my application name. I could know the name of the mutex, as it is fixed. However, If I load the second instance of my application parallel, there would be a confusion.
Can the following be the better idea?
I have an idea to name the mutex in the parent process as the process id. So now I need to fetch the Parent's process ID from the child process/grand child process to open the mutex.
I guess there are no direct ways to fetch parent process id from the grand child process. So I have to pass on the process id in every create process api(in lpenvironment parm).
Can anyone suggest a simple method, as mutexes are most commonly used.... I am a newbie.
The main idea is fine, but you can maybe make some implementation tweaks.
For one, if your application involves multiple processes cooperating, then the main "controller" process which spawns sub-processes can easily pass its PID via a command line argument. If sub-processes spawn their own children as well, they can transfer the PID via the same mechanism.
Taking this idea further, you can also skip the PID entirely and pass the mutex name itself via command line argument to child processes. This approach has the advantage that the parent and child processes do not need to both include code that derives the mutex name from the PID. By passing the mutex name itself you decouple child processes from having to know how it is produced. This approach is used by many mainstream apps, e.g. Google Chrome.
And finally, you can maybe do better by adding a random string (a GUID maybe?) to the mutex name. I don't believe anyone will name their own global synchronization object with the same name, but some extra precautions won't hurt.
As I understand it, you propose to use a process ID (PID) as the basis for naming a mutex to be used by your application and its subprocesses. This way, they will have their own mutex name that will not clash with the mutex name used by a second instance of your application.
This appears valid, but handles would be reliable than PIDs, because PIDs can get recycled. The method of using handles (passing them to child processes, similar to what you sugggest) is discussed on this StackOverflow thread.
I think passing the information you need to share to child processes is the way to go. Windows has the concepts for progress groups for a console process and its child processes, but this is really designed for being able to signal all the processes as a group -- not for sharing information among the group.
And there are also job objects for managing a group of processes that belong to a common job, but again, this is designed for managing a group of processes, not for information sharing between the processes in the group.
If I interprete the wording "a process and its sub-processes" as well as "child/grandchild", the situation is that you have a single parent process that launches one or several children (or, children launching grandchildren). Or, any combination of these, but either way, every process we talk about using the same mutex that is created by the parent.
If that assumption is correct, why not just use something embarrassingly simple as:
#define MUTEXNAME "MzdhYTYzYzc3Mzk4ZDk1NDQ3MzI2MmUxYTAwNTdjMWU2MzJlZGE3Nw"
In case you wonder where this one came from, I generated it with this one-liner:
php -r "echo substr(base64_encode(sha1('some text')), 0, -2);"
Replace 'some text' with your name, the current date, or whatever random words are at your mind at this very moment. The chances that any other application on your system will ever have the same mutex name is practically zero.

In a POSIX environment, how do I track files accessed by a child process?

I have my own POSIX application which starts a child process. I want the parent process to be notified with the names of all files the child process reads or writes, as well as the file names of any child processes the child spawns, and any dynamic libraries it loads. Similarly, I need to monitor all child processes spawned by child processes, etc.
How is this done?
I have two ideas for this.
Method 1 - The "real way".
I think you want ptrace. But it isn't going to be easy to use.
Essentially this call is for writing a debugger. Note that PTRACE_SYSCALL steps until the next syscall. At which point you might be able to use more ptrace calls to peek at the process's memory to observe if it's, say, a call to open().
Method 2 - The lazy, hackish way.
You could use the LD_PRELOAD environment variable. That is, write a shared library with your own implementation of the calls you want to hook (say, open(), dlopen()), adding your own code and dispatching to the normal libc version. Then you point the LD_PRELOAD environment variable at this shared library so the dynamic linker will load it at process start.
One downside to this approach is that if a process knows it's being observed this way, it can reset the environment variable and execute itself again, and evade detection. Another I can think of is that as a security feature this environment variable is not honored if you're root.

Resources