Program can't kill() processes when launched from boot script - c

I'm sure my question has probably been answered previously but I didn't find anything specific to my situation after searching for a while.
Background:
I have written a suite of data acquisition tools in C that run on an embedded system running Debian Wheezy. There is a main module, called Dispatch, whose job is to launch the rest of the modules and pass messages between them. I put a trivial bash script in /etc/init.d that executes Dispatch when the system boots since this system runs unattended. This system runs without any local user interaction so Dispatch should really be written to function as a daemon but it is not. The startup script simply executes /opt/bcdispatch &.
There's a bug in one of the other modules that causes it to crash every few days. I'm trying to hunt down that bug but in the meantime I am trying to write a watchdog program that will detect the crash, kill off all of my processes, then relaunch Dispatch. For reasons I won't go into it is not sufficient to just relaunch the crashed process, the whole suite of tools needs to be restarted.
What I'm trying to do:
I wrote a simple watchdog program that periodically executes popen("ps aux | grep bc") (all of my process names start with "bc" which makes it easy to find them with grep), finds that one of the modules has crashed by looking for anything with a "zombie" status in any of the lines read from popen(), kills all of my processes by calling system("kill <PID>"), then executes the startup script in /etc/init.d and exits. I modified the startup script so that it launches the watchdog after launching Dispatch. The startup script now looks like:
/opt/bcdispatch &
/opt/mywatchdog &
Everything is being run as root. There are no other user accounts on the system.
Problem
The watchdog process works fine if I run it from the command line. It kills off all of the processes it's supposed to, launches the startup script, then exits. However, when the watchdog is launched by the startup script at boot time it doesn't do its thing. It's running, one of the processes it's monitoring has crashed, but it doesn't kill the rest of them off. It just sits there like a giant turd. I can start another instance of it from the command line and that one works just fine.
Question
So my question (finally!) is: why can't my program kill other processes when launched via a startup script? I suspect it has something to do with the fact that the watchdog process no longer has a terminal associated with it? I tried substituting the call to system("kill <PID>") with kill(PID) but that didn't change anything.
EDIT
It just occurred to me that it's not the kill()ing part that doesn't work (well, that might be broken as well), the call to popen("ps aux | grep bc") must not be working since the watchdog should exit after it finds the zombie process but it isn't. Its PID is still the same as it was when the system booted. I guess this means the title of this question isn't very good.

Found the problem. The output of my watchdog's call to popen("ps aux | grep bc") was being truncated to 80 columns, presumably because it was no longer attached to a terminal and that's the default terminal width. That truncation was causing problems for the way the program was parsing the results of the ps command so it never found the crashed process. Changing the command to popen("ps -w aux | grep bc") was all that was needed to fix it.

Related

C Windows: generate a child process in order to stop and restart (after timeout) the parent process

i have to realize a program that, among other things, have to kill itself (in response to a received command) and restart after a timeout set before abort; moreover it have to log that the restart is due to this kind of operation. In linux this could be done quite easily using a fork and managing the different pid, but unfortunately i have to realize this program in windows, using plain C. I have read several article, saying that a clone of fork in windows is a real pain. I have tried to understand createProcess but it appears not so indicated in this case. A solution could be realize a second program and passing it the timeout trough createProcess and command line argument but it is a soultion that i wish to avoid if possible.
If you need the fork() semantics, then your options are:
Windows Subsystem on Linux, which already has a fork()
Cygwin, which also has a fork()
Write your own... this is not easy... at all
If you can "cheat", an option would be to create and kill threads instead of processes. For transient data, you can use TLS (Thread Local Storage).
Another cheat would be to create a dump file. Say, save a file with MiniDumpWriteDump before terminating a process, later read it with MiniDumpReadDumpStream when starting a new process. This is also not so easy and it fails if you rebuild your application and use an old dump file. But, at least it's a well known Windows API.
If none of the above works for you, the only remaining option is to use CreateProcess(), which is a spawn(), not a fork(), then add code to support the fork() features that you need.

Killing every process except system's and my own

I'm trying to make a virus to run on VMWare so I could have some fun with Ubuntu and
experiment with it. Now, I would be interested that my experimental virus (although it's hardly a virus, it's more of an actual program) would be able to kill/terminate every process but itself and the system processes.
I thought of 2 options:
Either I get all the non-system processes IDs and kill each, comparing to mine first, to avoid killing myself.
OR there's an actual command or a function built-in doing that in some, I did some research and I succeeded making my process 'shielded' from any terminating/killing signals, but I'm not sure how to search other processes IDs (non-system ones).
Any idea on how to perform this?
Following command is to list every process on the system.
ps aux

Daemonize/Background A Process Launched Via Script From Another Program

I was hesitant to post this question because I assumed someone somewhere had asked it already but after much scouring, I've come up empty, so here it is.
BACKGROUND: I'm running a local agent (written in C, listening via TCP) which allows for execution of a small number of scripts/commands remotely. (Via a web interface, to be specific.) The scripts themselves are a mixture of binaries, bash, or perl scripts and the agent itself doesn't really care, as long as they are allowed in the list.
(This is on a corporate, internal network and this is in the very early stages, so please don't debate the merits of security at this time.)
The C agent code to launch processes is this:
sprintf(mrun, "%s %s 2>&1", file, args);
mexec = popen(mrun, "r");
[read some returned buffer]
pclose(mexec);
This approach works well for both external bash and perl scripts, provided the scripts just execute commands (or do things in the foreground). However, I recently had a need to expand a script to include a restart of a daemon, in this case, named.
The script itself (bash) is simple:
#!/bin/bash
pkill -9 named
/local/mnt/named/sbin/named -c /local/mnt/named/var/named.conf &
echo "restarted"
The problem I am running into is that the script never finishes (i.e. restarted is never echo'd) when run via the C agent, so the control is never returned and the TCP socket never gets free'd up. As far as the agent is concerned, the process is still running. If I run the script from a terminal, it works fine and control is returned back to me.
Am I missing something that would allow the script to execute normally when being forked off from a C daemon versus just being called from the bash terminal?
I know of nohup and I guess could use that if all else fails but I was curious if there is some other kind of workaround for doing this.
Based on feedback from the comments above, I was able to get the script to continue working after launching the daemon process, thanks to some additional redirects:
/local/mnt/named/sbin/named -c /local/mnt/named/var/named.conf </dev/null &> /dev/null &
So, thanks to fork0 for that bit of knowledge.
Afterward, I noticed that the TCP socket connection wouldn't close properly, even though the script was done working. After some more info below and doing a lot of research, it turns out that child processes will inherit (and keep open) file descriptors from the parent process (which includes sockets).
I looked all over for methods to disown the child process but didn't really find any that would work for me (or didn't constitute an entire rewrite of the agent).
Finally, I stumbled upon this question, which is related but not in a programming language I use:
os.execute without inheriting parent's fds
This basically involved the child process closing any open file descriptors inside the code, thus freeing them to be closed by the parent. (I think?)
I added a few lines to the bash script to do this prior to starting named and it does work.
for i in `nawk 'BEGIN{ for(i=1;i<=255;i++) print i}'`
do
eval exec `echo $i | sed -e 's/.*/&<\&-/'`
done
(I would up using nawk instead of seq because I need it to run on Solaris and Linux.)
Some basic testing shows that this has solved the major issue of the socket not being able to close but I'll need to do some more research on whether this will have any other ramifications that I am not aware of. There may also be a better, safer way to achieve this but at least I'm on the right track.

Issue with program executed by crontab

I've coded a program in c for an embedded system (Devkit8000, which is a clone of the well known BeagleBoard) running Angstrom Linux.
The program creates a couple of threads, on of them is responsible of taking pictures with a camera connected to the board, and right now the second thread only moves that images to another path. The program should be running during the whole day, and the only way to stop it is sending a signal.
I edited the crontab to launch the program in a specific hour and to send a signal when it has to stop, the issue is that launching the program in this way cause the process to be killed after some time running, but, if i launch the program manually (through the command line), it works perfectly and dont get stopped.
I have no idea about the reason of this different behaviour between crontab and command line. I've checked the system logs but didnt find anything useful. I've also been reading a little and find that the OS can kill a process if it is using so much resources, but doesnt make sense that this happens in only 1 scenario (crontab vs manually)...
Any clue about what is happening?
Thank you in advance!
The main difference is that running a job through cron invokes a non-interactive non-login shell. The effect of that depends on the default shell for your user. For example, if you are using Korn shell or Bash then your .profile will not be executed, as it would on an interactive login shell. Korn shell 88 will execute .kshrc (the $ENV file) but ksh93 will not.
So, a good start might be to call your program from a script, after first "sourcing" your .profile file:
. $HOME/.profile
Failing that... When you say that the process is "killed", do you get such a message? If so, then that sounds like someone sending SIGKILL, i.e. kill -9. If not, then maybe you could run strace or ltrace to find out at what point it dies.

Building an "odometer" for time spent on a server

I want to build an odometer to keep track of how long I've been on a server since I last reset the counter.
Recently I've been logging quite a bit of time working on one of my school's unix servers and began wondering just how much time I had racked up in the last couple days. I started trying to think of how I could go about writing either a Bash script or C program to run when my .bash_profile was loaded (ie. when I ssh into the server), background itself, and save the time to a file when I closed the session.
I know how to make a program run when I login (through the .bash_profile) and how to background a C program (by way of forking?), but am unsure how to detect that the ssh session has been terminated (perhaps by watching the sshd process?)
I hope this is the right stack exchange to ask how you would go about something like this and appreciate any input.
Depending on your shell, you may be able to just spawn a process in the background when you log in, and then handle the kill signal when the parent process (the shell) exits. It wouldn't consume resources, you wouldn't need root privileges, and it should give a fairly accurate report of your logged in time.
You may need to use POSIX semaphores to handle the case of multiple shells logged in simultaneously.
Have you considered writing a script that can be run by cron every minute, running "who", looking at its output for lines with your uid in them, and bumping a counter if it finds any? (Use "crontab -e" to edit your crontab.)
Even just a line in crontab like this:
* * * * * (date; who | grep $LOGNAME)>>$HOME/.whodata
...would create a log you could process later at your leisure.

Resources