I have a situation here, a few days back I was able to see a core- dump file on my target board, I have enabled the core-dump generation by adding "ulimit -c unlimited" to my /etc/profile.
But then someone told me, this will only take affect for program launched from a login shell, not for processes/services started by systemd, etc. and the ulimits are set at another location.
So I changed /etc/limits file and added ulimit -c unlimited line, but still I could not see core-dump file.
I am running kill -9 $$ to generate segmentation fault and it in turn will generate core-dump file as it was doing earlier.
We tried changing "/proc/sys/kernel/core_pattern" file and running ulimit -c unlimited explicitly but this was not enough.
Where we are going wrong?
kill -9 will not generate a core file. The command kill -l gives a list of supported signals. kill -6 or kill -SIGABRT should produce a core file. As well as most other signals such as kill -BUS, kill -SEGV, etc.
kill -11 always works for me. 11 is SIGSEGV (invalid memory reference)
You have to first off enable user limits settings to ensure that core files can be created.
ulimit -c unlimited
Application user must run as and before you start the application in the same session. This setting is inherited by the application, so what ever the ulimit is set as before starting the application is what the ulimit setting will be for the application (unless a start script changes it).
In addition of the other answers, you might also use gcore(1) to generate a core dump of some running process.
But if using kill(1) command (or the underlying kill(2) syscall, e.g. from some ad-hoc program), I recommend using SIGABRT (the signal that abort(3) send to itself after unblocking it), as documented in the signal(7).
Beware, a program can usually forbid core dumping, e.g. by calling setrlimit(2) with RLIMIT_CORE set to 0 or handling or ignoring some signals (with e.g. sigaction(2)...).
Related
i have to realize a program that, among other things, have to kill itself (in response to a received command) and restart after a timeout set before abort; moreover it have to log that the restart is due to this kind of operation. In linux this could be done quite easily using a fork and managing the different pid, but unfortunately i have to realize this program in windows, using plain C. I have read several article, saying that a clone of fork in windows is a real pain. I have tried to understand createProcess but it appears not so indicated in this case. A solution could be realize a second program and passing it the timeout trough createProcess and command line argument but it is a soultion that i wish to avoid if possible.
If you need the fork() semantics, then your options are:
Windows Subsystem on Linux, which already has a fork()
Cygwin, which also has a fork()
Write your own... this is not easy... at all
If you can "cheat", an option would be to create and kill threads instead of processes. For transient data, you can use TLS (Thread Local Storage).
Another cheat would be to create a dump file. Say, save a file with MiniDumpWriteDump before terminating a process, later read it with MiniDumpReadDumpStream when starting a new process. This is also not so easy and it fails if you rebuild your application and use an old dump file. But, at least it's a well known Windows API.
If none of the above works for you, the only remaining option is to use CreateProcess(), which is a spawn(), not a fork(), then add code to support the fork() features that you need.
I'm sure my question has probably been answered previously but I didn't find anything specific to my situation after searching for a while.
Background:
I have written a suite of data acquisition tools in C that run on an embedded system running Debian Wheezy. There is a main module, called Dispatch, whose job is to launch the rest of the modules and pass messages between them. I put a trivial bash script in /etc/init.d that executes Dispatch when the system boots since this system runs unattended. This system runs without any local user interaction so Dispatch should really be written to function as a daemon but it is not. The startup script simply executes /opt/bcdispatch &.
There's a bug in one of the other modules that causes it to crash every few days. I'm trying to hunt down that bug but in the meantime I am trying to write a watchdog program that will detect the crash, kill off all of my processes, then relaunch Dispatch. For reasons I won't go into it is not sufficient to just relaunch the crashed process, the whole suite of tools needs to be restarted.
What I'm trying to do:
I wrote a simple watchdog program that periodically executes popen("ps aux | grep bc") (all of my process names start with "bc" which makes it easy to find them with grep), finds that one of the modules has crashed by looking for anything with a "zombie" status in any of the lines read from popen(), kills all of my processes by calling system("kill <PID>"), then executes the startup script in /etc/init.d and exits. I modified the startup script so that it launches the watchdog after launching Dispatch. The startup script now looks like:
/opt/bcdispatch &
/opt/mywatchdog &
Everything is being run as root. There are no other user accounts on the system.
Problem
The watchdog process works fine if I run it from the command line. It kills off all of the processes it's supposed to, launches the startup script, then exits. However, when the watchdog is launched by the startup script at boot time it doesn't do its thing. It's running, one of the processes it's monitoring has crashed, but it doesn't kill the rest of them off. It just sits there like a giant turd. I can start another instance of it from the command line and that one works just fine.
Question
So my question (finally!) is: why can't my program kill other processes when launched via a startup script? I suspect it has something to do with the fact that the watchdog process no longer has a terminal associated with it? I tried substituting the call to system("kill <PID>") with kill(PID) but that didn't change anything.
EDIT
It just occurred to me that it's not the kill()ing part that doesn't work (well, that might be broken as well), the call to popen("ps aux | grep bc") must not be working since the watchdog should exit after it finds the zombie process but it isn't. Its PID is still the same as it was when the system booted. I guess this means the title of this question isn't very good.
Found the problem. The output of my watchdog's call to popen("ps aux | grep bc") was being truncated to 80 columns, presumably because it was no longer attached to a terminal and that's the default terminal width. That truncation was causing problems for the way the program was parsing the results of the ps command so it never found the crashed process. Changing the command to popen("ps -w aux | grep bc") was all that was needed to fix it.
I was hesitant to post this question because I assumed someone somewhere had asked it already but after much scouring, I've come up empty, so here it is.
BACKGROUND: I'm running a local agent (written in C, listening via TCP) which allows for execution of a small number of scripts/commands remotely. (Via a web interface, to be specific.) The scripts themselves are a mixture of binaries, bash, or perl scripts and the agent itself doesn't really care, as long as they are allowed in the list.
(This is on a corporate, internal network and this is in the very early stages, so please don't debate the merits of security at this time.)
The C agent code to launch processes is this:
sprintf(mrun, "%s %s 2>&1", file, args);
mexec = popen(mrun, "r");
[read some returned buffer]
pclose(mexec);
This approach works well for both external bash and perl scripts, provided the scripts just execute commands (or do things in the foreground). However, I recently had a need to expand a script to include a restart of a daemon, in this case, named.
The script itself (bash) is simple:
#!/bin/bash
pkill -9 named
/local/mnt/named/sbin/named -c /local/mnt/named/var/named.conf &
echo "restarted"
The problem I am running into is that the script never finishes (i.e. restarted is never echo'd) when run via the C agent, so the control is never returned and the TCP socket never gets free'd up. As far as the agent is concerned, the process is still running. If I run the script from a terminal, it works fine and control is returned back to me.
Am I missing something that would allow the script to execute normally when being forked off from a C daemon versus just being called from the bash terminal?
I know of nohup and I guess could use that if all else fails but I was curious if there is some other kind of workaround for doing this.
Based on feedback from the comments above, I was able to get the script to continue working after launching the daemon process, thanks to some additional redirects:
/local/mnt/named/sbin/named -c /local/mnt/named/var/named.conf </dev/null &> /dev/null &
So, thanks to fork0 for that bit of knowledge.
Afterward, I noticed that the TCP socket connection wouldn't close properly, even though the script was done working. After some more info below and doing a lot of research, it turns out that child processes will inherit (and keep open) file descriptors from the parent process (which includes sockets).
I looked all over for methods to disown the child process but didn't really find any that would work for me (or didn't constitute an entire rewrite of the agent).
Finally, I stumbled upon this question, which is related but not in a programming language I use:
os.execute without inheriting parent's fds
This basically involved the child process closing any open file descriptors inside the code, thus freeing them to be closed by the parent. (I think?)
I added a few lines to the bash script to do this prior to starting named and it does work.
for i in `nawk 'BEGIN{ for(i=1;i<=255;i++) print i}'`
do
eval exec `echo $i | sed -e 's/.*/&<\&-/'`
done
(I would up using nawk instead of seq because I need it to run on Solaris and Linux.)
Some basic testing shows that this has solved the major issue of the socket not being able to close but I'll need to do some more research on whether this will have any other ramifications that I am not aware of. There may also be a better, safer way to achieve this but at least I'm on the right track.
I've coded a program in c for an embedded system (Devkit8000, which is a clone of the well known BeagleBoard) running Angstrom Linux.
The program creates a couple of threads, on of them is responsible of taking pictures with a camera connected to the board, and right now the second thread only moves that images to another path. The program should be running during the whole day, and the only way to stop it is sending a signal.
I edited the crontab to launch the program in a specific hour and to send a signal when it has to stop, the issue is that launching the program in this way cause the process to be killed after some time running, but, if i launch the program manually (through the command line), it works perfectly and dont get stopped.
I have no idea about the reason of this different behaviour between crontab and command line. I've checked the system logs but didnt find anything useful. I've also been reading a little and find that the OS can kill a process if it is using so much resources, but doesnt make sense that this happens in only 1 scenario (crontab vs manually)...
Any clue about what is happening?
Thank you in advance!
The main difference is that running a job through cron invokes a non-interactive non-login shell. The effect of that depends on the default shell for your user. For example, if you are using Korn shell or Bash then your .profile will not be executed, as it would on an interactive login shell. Korn shell 88 will execute .kshrc (the $ENV file) but ksh93 will not.
So, a good start might be to call your program from a script, after first "sourcing" your .profile file:
. $HOME/.profile
Failing that... When you say that the process is "killed", do you get such a message? If so, then that sounds like someone sending SIGKILL, i.e. kill -9. If not, then maybe you could run strace or ltrace to find out at what point it dies.
My fedora12 installed a tool called ABRT that comes probably with GNOME. This
tool operates at the background and reports at realtime any process that has crashed.
I have used a signal handler that was able to catch a SIGSEGV signal, ie it could report
crashed.
What other ways exist in order a process to get information about the state (especially a core) of an other process without having parent-child connection?
Any ideas? It seems a very interesting issue.
ABRT is open source, after all, so why not look at their code. The architecture is explained here -- it looks like they monitor $COREDUMPDIR to detect when a new core file appears.
Your question is not entirely clear, but it is possible to get a core of a running process using gcore:
gcore(1) GNU Tools gcore(1)
NAME
gcore - Generate a core file for a running process
SYNOPSIS
gcore [-o filename] pid
DESCRIPTION
gcore generates a core file for the process specified by its process
ID, pid. By default, the core file is written to core.pid, in the cur‐
rent directory.
-o filename
write core file to filename instead of core.pid