Any possible solution to capture process entry/exit? - c

I Would like to capture the process entry, exit and maintain a log for the entire system (probably a daemon process).
One approach was to read /proc file system periodically and maintain the list, as I do not see the possibility to register inotify for /proc. Also, for desktop applications, I could get the help of dbus, and whenever client registers to desktop, I can capture.
But for non-desktop applications, I don't know how to go ahead apart from reading /proc periodically.
Kindly provide suggestions.

You mentioned /proc, so I'm going to assume you've got a linux system there.
Install the acct package. The lastcomm command shows all processes executed and their run duration, which is what you're asking for. Have your program "tail" /var/log/account/pacct (you'll find its structure described in acct(5)) and voila. It's just notification on termination, though. To detect start-ups, you'll need to dig through the system process table periodically, if that's what you really need.

Maybe the safer way to move is to create a SuperProcess that acts as a parent and forks children. Everytime a child process stops the father can find it. That is just a thought in case that architecture fits your needs.
Of course, if the parent process is not doable then you must go to the kernel.

If you want to log really all process entry and exits, you'll need to hook into kernel. Which means modifying the kernel or at least writing a kernel module. The "linux security modules" will certainly allow hooking into entry, but I am not sure whether it's possible to hook into exit.
If you can live with occasional exit slipping past (if the binary is linked statically or somehow avoids your environment setting), there is a simple option by preloading a library.
Linux dynamic linker has a feature, that if environment variable LD_PRELOAD (see this question) names a shared library, it will force-load that library into the starting process. So you can create a library, that will in it's static initialization tell the daemon that a process has started and do it so that the process will find out when the process exits.
Static initialization is easiest done by creating a global object with constructor in C++. The dynamic linker will ensure the static constructor will run when the library is loaded.
It will also try to make the corresponding destructor to run when the process exits, so you could simply log the process in the constructor and destructor. But it won't work if the process dies of signal 9 (KILL) and I am not sure what other signals will do.
So instead you should have a daemon and in the constructor tell the daemon about process start and make sure it will notice when the process exits on it's own. One option that comes to mind is opening a unix-domain socket to the daemon and leave it open. Kernel will close it when the process dies and the daemon will notice. You should take some precautions to use high descriptor number for the socket, since some processes may assume the low descriptor numbers (3, 4, 5) are free and dup2 to them. And don't forget to allow more filedescriptors for the daemon and for the system in general.
Note that just polling the /proc filesystem you would probably miss the great number of processes that only live for split second. There are really many of them on unix.

Here is an outline of the solution that we came up with.
We created a program that read a configuration file of all possible applications that the system is able to monitor. This program read the configuration file and through a command line interface you was able to start or stop programs. The program itself stored a table in shared memory that marked applications as running or not. A interface that anybody could access could get the status of these programs. This program also had an alarm system that could either email/page or set off an alarm.
This solution does not require any changes to the kernel and is therefore a less painful solution.
Hope this helps.

Related

When a process writes to a file

Generally, when a process writes to a file, e.g a python script running open('file', 'w').write('text'), what are the exact events that occur? By that I mean something among the lines of 'process A loads file from hard disk to RAM, process B changes content then ...'. I've read about IPC and now I'm trying to dig deeper and understand more on the subject of processes. I couldn't find a thorough explanation on the subject, so if you could find one or explain I'd really appreciate it.
The example of "a python script running open('file', 'w').write('text')" is heavily OS-dependent. The only processes involved here are the process running the Python interpreter, which, e.g. on Linux, can sometimes execute in userspace and sometimes execute in kernel space, and possibly some kernel-only processes, with any IPC, if required, happening inside the kernel. There is no particular requirement that everything down to the disk read itself cannot be handled on the user's process when it is running in kernel mode, but in practice, there may be other processes involved. This is OS- and even driver-specific behavior.
In this particular example (which isn't great, because it relies on the automatic cPython close when the variable goes out of scope), the Python process makes a system call to open a file, one to write the file, and one to close the file. These are all blocking -- that is, they do not return until the results are ready. When the process blocks, it is put on a queue waiting for some event to occur to make it ready to run again.
The opposite of this is asynchronous I/O, which can be performed by polling, by callbacks, or by the select statement, which can block until any one of a number of events has occurred.
But when most people talk about IPC, they are not usually talking about communication between or with kernel processes. Rather, they are talking about communication between multiple user processes and/or threads, using semaphores, mutexes, named pipes, etc. A good introduction to these sorts of things would be any tutorial information you can find on using pthreads, or even the Python threads and multiprocessing modules. There are examples there for several simple cases.
The primary difference between processes and threads on Linux is that threads share an address space and processes each have their own address space. Python itself adds the wrinkle of the GIL, which limits the utility of threads in Python somewhat.

setting up IPC between unrelated processes

I would like to inject a shared library into a process (I'm using ptrace() to do that part) and then be able to get output from the shared library back into the debugger I'm writing using some form of IPC. My instinct is to use a pipe, but the only real requirements are:
I don't want to store anything on the filesystem to facilitate the communication as it will only last as long as the debugger is running.
I want a portable Unix solution (so Unix-standard syscalls would be ideal).
The problem I'm running into is that as far as I can see, if I call pipe() in the debugger, there is no way to pass the "sending" end of the pipe to the target process, and vice versa with the receiving end. I could set up shared memory, but I think that would require creating a file somewhere so I could reference the memory segment from both processes. How do other debuggers capture output when they attach to a process after it has already begun running?
I assume that you are in need of a debugging system for your business logic code (I mean application). From my experience, this kind of problem is tackled with below explained system design. (My experience is in C++, I think the same must hold good for the C based system also.)
Have a logger system (a separate process). This will contain - logger manager and the logging code - which will take the responsibility of dumping the log into hard disk.
Each application instance (process running in Unix) will communicate to this process with sockets. So you can have your own messaging protocol and communicate with the logger system with socket based communication.
Later, for each of this application - have a switch which can switch off/on the log.So that you can have a tool - to send signal to this process to switch on/off the message logging.
At a high level, this is the most generic way to develop a logging system. In case you need any information - Do comment it. I will try to answer.
Using better search terms showed me this question is a dup of these guys:
Can I share a file descriptor to another process on linux or are they local to the process?
Can I open a socket and pass it to another process in Linux
How to use sendmsg() to send a file-descriptor via sockets between 2 processes?
The top answers were what I was looking for. You can use a Unix-domain socket to hand a file descriptor off to a different process. This could work either from debugger to library or vice versa, but is probably easier to do from debugger to library because the debugger can write the socket's address into the target process while it injects the library.
However, once I pass the socket's address into the target process, I might as well just use the socket itself instead of using a pipe in addition.

Is it possible to interrupt a process and checkpoint it to resume it later on?

Lets say, you have an application, which is consuming up all the computational power. Now you want to do some other necessary work. Is there any way on Linux, to interrupt that application and checkpoint its state, so that later on it could be resumed from the state it was interrupted?
Especially I am interested in a way, where the application could be stopped and restarted on another machine. Is that possible too?
In general terms, checkpointing a process is not entirely possible (because a process is not only an address space, but also has other resources likes file descriptors, and TCP/IP sockets ...).
In practice, you can use some checkpointing libraries like BLCR etc. With certain limiting conditions, you might be able to migrate a checkpoint image from one system to another one (very similar to the source one: same kernel, same versions of libraries & compilers, etc.).
Migrating images is also possible at the virtual machine level. Some of them are quite good for that.
You could also design and implement your software with your own checkpointing machinery. Then, you should think of using garbage collection techniques and terminology. Look also into Emacs (or Xemacs) unexec.c file (which is heavily machine dependent).
Some languages implementation & runtime have checkpointing primitives. SBCL (a free Common Lisp implementation) is able to save a core image and restart it later. SML/NJ is able to export an image. Squeak (a Smalltalk implementation) also has such ability.
As an other example of checkpointing, the GCC compiler is actually able to compile a single *.h header (into a pre-compiled header file which is a persistent image of GCC heap) by using persistence techniques.
Read more about orthogonal persistence. It is also a research subject. serialization is also relevant (and you might want to use textual formats à la JSON, YAML, XML, ...). You might also use hibernation techniques (on the whole system level).
From the man pages man kill
Interrupting a process requires two steps:
To stop
kill -STOP <pid>
and
To continue
kill -CONT <pid>
Where <pid> is the process-id.
Type: Control + Z to suspend a process (it sends a SIGTSTP)
then bg / fg to resume it in background or in foreground
Checkingpointing an individual process is fundamentally impossible on POSIX. That's because processes are not independent; they can interact. If nothing else, a process has a unique process ID, which it might have stored somewhere internally, and if you resume it with a different process ID, all hell could break loose. This is especially true if the process uses any kind of locks/synchronization primitives. Of course you also can't resume the process with the same process ID it originally had, since that might be taken by a new process.
Perhaps you could solve the problem by making process (and thread) ids 128-bit or so, such that they're universally unique...
On linux it is achivable by sending this process STOP signal. Leter on you resume it by sending CONT signal. Please refer to kill manual.

Inter-program communication for an arbitrary number of programs

I am attempting to have a bunch of independent programs intelligently allocate shared resources among themselves. However, I could have only one program running, or could have a whole bunch of them.
My thought was to mmap a virtual file in each program, but the concurrency is killing me. Mutexes are obviously ineffective because each program could have a lock on the file and be completely oblivious of the others. However, my attempts to write a semaphore have all failed, since the semaphore would be internal to the file, and I can't rely on only one thing writing to it at a time, etc.
I've seen quite a bit about named pipes but it doesn't seem to be to be a practical solution for what I'm doing since I don't know how many other programs there will be, if any, nor any way of identifying which program is participating in my resource-sharing operation.
You could use a UNIX-domain socket (AF_UNIX) - see man 7 unix.
When a process starts up, it tries to bind() a well-known path. If the bind() succeeds then it knows that it is the first to start up, and becomes the "resource allocator". If the bind() fails with EADDRINUSE then another process is already running, and it can connect() to it instead.
You could also use a dedicated resource allocator process that always listens on the path, and arbitrates resource requests.
Not entirely clear what you're trying to do, but personally my first thought would be to use dbus (more detail). Should be easy enough within that framework for your processes/programs to register/announce themselves and enumerate/signal other registered processes, and/or to create a central resource arbiter and communicate with it. Readily available on any system with gnome or KDE installed too.

kernel programming

Is there any other way to execute a program using kernel, other than shell and system calls?
It always used to be the case that there was really only one way to execute a program on Unix and its derivatives, and that was via the exec() system calls. The very first (kernel) process was created by the boot loader; all subsequent processes were created by fork() and exec(). Of course, fork() only created a copy of the original program; it was the exec() system call - in one of a number of forms in the C source code, but eventually equivalent to execve() - that did the donkey work of replacing the current process with a new image.
These days, there are mechanisms like posix_spawn() which might, or might not, use a separate system call to achieve roughly the same mechanism.
A lot of kernels has support for adding kernel modules or drivers at run time. If you want to execute some code from kernel space (probably because you need higher privileges), you can write a kernel module/driver of your own and load it to execute your code. However, inserting a driver only doesn't ensure that your code will be executed. Based on your driver implementation, you will have to have some triggering mechanism for executing your code in kernel space.
yeah, you can compile your kernel with your program sourced in it, but it won't be the smartest thing to do.
Every program is internally executed by Kernel. If you are looking for running kernel module, you have to use the system calls to reach that module and perform some work for you in Kernel mode. Kernel is event driven and only system calls trigger execution of its modules (apart from some system events like network packet received)

Resources