I am currently doing some research on how we can extend the monitoring solution for Linux in our datacenter in order to detect inaccessible NFS mounts. My idea was to look for NFS mounts in /proc/self/mountinfo and then for each mount, call alarm(), issue a syncronous interruptible call via stat()/fsstat() or similar, and in case of an alarm, return an error in the signal handler. However, I experienced the following behaviour which I am not sure how to explain or debug.
It turned out that when a process waiting in the stat system call on a mountpoint of a diconnected NFS server, it responds to signals as expected. For example, one can exit it pressing Strc+C, or it displays "Alarm clock" and ends when the alarm timer fires. The same applies e.g. to SIGUSR1/2, leading the program to display "User defined signal 1" (or "2") and end. I suspect these messages come from a general signal dispatcher inside glibc, but it would be nice to hear some details on how this works.
In all cases in which a custom signal handler was registered, the process transitions to an uninterruptible sleep state when a signal for this custom handler is scheduled; leading to no other signal being processed anymore. Of course this applies to SIGALRM as well when the alarm() timer sends the signal. All signals show up in /proc/PID/status as below:
Threads: 1
SigQ: 4/31339
SigPnd: 0000000000000000
ShdPnd: 0000000000002a02
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000000000200
I looked at the information from "echo w > /proc/sysrq-trigger" but there is nothing of help to me:
[26099350.815187] signal D 0000000000000000 0 49633 39989 0x00000084
[26099350.815193] ffff880001d27b88 0000000000000046 ffff88008a8184c0 ffff880001d28000
[26099350.815199] ffff880001d27c18 ffffffff81e0a168 ffffffffa03d1df0 0000000000000000
[26099350.815204] ffff880001d27ba0 ffffffff81619dd5 ffff88008a8184c0 0000000000000082
[26099350.815209] Call Trace:
[26099350.815213] [<ffffffff81619dd5>] schedule+0x35/0x80
[26099350.815223] [<ffffffffa03d1e0e>] rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
[26099350.815227] [<ffffffff8161a1ea>] __wait_on_bit+0x5a/0x90
[26099350.815231] [<ffffffff8161a32e>] out_of_line_wait_on_bit+0x6e/0x80
[26099350.815242] [<ffffffffa03d2e7e>] __rpc_execute+0x14e/0x450 [sunrpc]
[26099350.815251] [<ffffffffa03ca089>] rpc_run_task+0x69/0x80 [sunrpc]
[26099350.815259] [<ffffffffa06dd166>] nfs4_call_sync_sequence+0x56/0x80 [nfsv4]
[26099350.815267] [<ffffffffa06ddc90>] _nfs4_proc_getattr+0xb0/0xc0 [nfsv4]
[26099350.815279] [<ffffffffa06e7c83>] nfs4_proc_getattr+0x53/0xd0 [nfsv4]
[26099350.815288] [<ffffffffa06a37c4>] __nfs_revalidate_inode+0x94/0x2a0 [nfs]
[26099350.815296] [<ffffffffa06a3d7e>] nfs_getattr+0x7e/0x250 [nfs]
[26099350.815303] [<ffffffff8121455a>] vfs_fstatat+0x5a/0x90
[26099350.815306] [<ffffffff812149ca>] SYSC_newstat+0x1a/0x40
[26099350.815312] [<ffffffff8161de61>] entry_SYSCALL_64_fastpath+0x20/0xe9
[26099350.817782] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x20/0xe9
It is also not possible to access anything in usermode as it is not possible to attach a debugger.
The development happened on SLES 12 SP4, Kernel version 4.4.162-94.72-default.
I am attaching some C and bash code for reproduction, the issue can be triggered with SIGUSR1 (kill -USR1 PID) or any other one with changes to the code. As for C, there is no difference in using signal() or sigaction() to install the handler. The handlers are deliberately left empty to be sure the is no "forbidden" function called inside.
Thanks for any idea helping me further.
C-Code:
#include <sys/stat.h>
#include <signal.h>
void sig_handler(int sig)
{
}
int main(void) {
int ret;
struct stat buf;
signal(SIGUSR1, sig_handler);
alarm(30);
ret = stat("/a", &buf);
return 0;
}
bash-Code:
#!/bin/bash
sighandler() {
declare unused
}
trap sighandler USR1
[[ -d /a ]] && echo "stat() returned"
In Linux I want to add a daemon that cannot be stopped and which monitors filesystem changes.
If any changes are detected, it should write the path to the console where it was started plus a newline.
I already have the filesystem changing code almost ready but I cannot figure out how to create a daemon.
My code is from here: http://www.yolinux.com/TUTORIALS/ForkExecProcesses.html
What to do after the fork?
int main (int argc, char **argv) {
pid_t pID = fork();
if (pID == 0) { // child
// Code only executed by child process
sIdentifier = "Child Process: ";
}
else if (pID < 0) {
cerr << "Failed to fork" << endl;
exit(1);
// Throw exception
}
else // parent
{
// Code only executed by parent process
sIdentifier = "Parent Process:";
}
return 0;
}
In Linux i want to add a daemon that cannot be stopped and which monitors filesystem changes. If any changes would be detected it should write the path to the console where it was started + a newline.
Daemons work in the background and (usually...) don't belong to a TTY that's why you can't use stdout/stderr in the way you probably want.
Usually a syslog daemon (syslogd) is used for logging messages to files (debug, error,...).
Besides that, there are a few required steps to daemonize a process.
If I remember correctly these steps are:
fork off the parent process & let it terminate if forking was successful. -> Because the parent process has terminated, the child process now runs in the background.
setsid - Create a new session. The calling process becomes the leader of the new session and the process group leader of the new process group. The process is now detached from its controlling terminal (CTTY).
Catch signals - Ignore and/or handle signals.
fork again & let the parent process terminate to ensure that you get rid of the session leading process. (Only session leaders may get a TTY again.)
chdir - Change the working directory of the daemon.
umask - Change the file mode mask according to the needs of the daemon.
close - Close all open file descriptors that may be inherited from the parent process.
To give you a starting point: Look at this skeleton code that shows the basic steps. This code can now also be forked on GitHub: Basic skeleton of a linux daemon
/*
* daemonize.c
* This example daemonizes a process, writes a few log messages,
* sleeps 20 seconds and terminates afterwards.
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <syslog.h>
static void skeleton_daemon()
{
pid_t pid;
/* Fork off the parent process */
pid = fork();
/* An error occurred */
if (pid < 0)
exit(EXIT_FAILURE);
/* Success: Let the parent terminate */
if (pid > 0)
exit(EXIT_SUCCESS);
/* On success: The child process becomes session leader */
if (setsid() < 0)
exit(EXIT_FAILURE);
/* Catch, ignore and handle signals */
//TODO: Implement a working signal handler */
signal(SIGCHLD, SIG_IGN);
signal(SIGHUP, SIG_IGN);
/* Fork off for the second time*/
pid = fork();
/* An error occurred */
if (pid < 0)
exit(EXIT_FAILURE);
/* Success: Let the parent terminate */
if (pid > 0)
exit(EXIT_SUCCESS);
/* Set new file permissions */
umask(0);
/* Change the working directory to the root directory */
/* or another appropriated directory */
chdir("/");
/* Close all open file descriptors */
int x;
for (x = sysconf(_SC_OPEN_MAX); x>=0; x--)
{
close (x);
}
/* Open the log file */
openlog ("firstdaemon", LOG_PID, LOG_DAEMON);
}
int main()
{
skeleton_daemon();
while (1)
{
//TODO: Insert daemon code here.
syslog (LOG_NOTICE, "First daemon started.");
sleep (20);
break;
}
syslog (LOG_NOTICE, "First daemon terminated.");
closelog();
return EXIT_SUCCESS;
}
Compile the code: gcc -o firstdaemon daemonize.c
Start the daemon: ./firstdaemon
Check if everything is working properly: ps -xj | grep firstdaemon
The output should be similar to this one:
+------+------+------+------+-----+-------+------+------+------+-----+
| PPID | PID | PGID | SID | TTY | TPGID | STAT | UID | TIME | CMD |
+------+------+------+------+-----+-------+------+------+------+-----+
| 1 | 3387 | 3386 | 3386 | ? | -1 | S | 1000 | 0:00 | ./ |
+------+------+------+------+-----+-------+------+------+------+-----+
What you should see here is:
The daemon has no controlling terminal (TTY = ?)
The parent process ID (PPID) is 1 (The init process)
The PID != SID which means that our process is NOT the session leader
(because of the second fork())
Because PID != SID our process can't take control of a TTY again
Reading the syslog:
Locate your syslog file. Mine is here: /var/log/syslog
Do a: grep firstdaemon /var/log/syslog
The output should be similar to this one:
firstdaemon[3387]: First daemon started.
firstdaemon[3387]: First daemon terminated.
A note:
In reality you would also want to implement a signal handler and set up the logging properly (Files, log levels...).
Further reading:
Linux-UNIX-Programmierung - German
Unix Daemon Server Programming
man 7 daemon describes how to create daemon in great detail. My answer is just excerpt from this manual.
There are at least two types of daemons:
traditional SysV daemons (old-style),
systemd daemons (new-style).
SysV Daemons
If you are interested in traditional SysV daemon, you should implement the following steps:
Close all open file descriptors except standard input, output, and error (i.e. the first three file descriptors 0, 1, 2). This ensures that no accidentally passed file descriptor stays around in the daemon process. On Linux, this is best implemented by iterating through /proc/self/fd, with a fallback of iterating from file descriptor 3 to the value returned by getrlimit() for RLIMIT_NOFILE.
Reset all signal handlers to their default. This is best done by iterating through the available signals up to the limit of _NSIG and resetting them to SIG_DFL.
Reset the signal mask using sigprocmask().
Sanitize the environment block, removing or resetting environment variables that might negatively impact daemon runtime.
Call fork(), to create a background process.
In the child, call setsid() to detach from any terminal and create an independent session.
In the child, call fork() again, to ensure that the daemon can never re-acquire a terminal again.
Call exit() in the first child, so that only the second child (the actual daemon process) stays around. This ensures that the daemon process is re-parented to init/PID 1, as all daemons should be.
In the daemon process, connect /dev/null to standard input, output, and error.
In the daemon process, reset the umask to 0, so that the file modes passed to open(), mkdir() and suchlike directly control the access mode of the created files and directories.
In the daemon process, change the current directory to the root directory (/), in order to avoid that the daemon involuntarily blocks mount points from being unmounted.
In the daemon process, write the daemon PID (as returned by getpid()) to a PID file, for example /run/foobar.pid (for a hypothetical daemon "foobar") to ensure that the daemon cannot be started more than once. This must be implemented in race-free fashion so that the PID file is only updated when it is verified at the same time that the PID previously stored in the PID file no longer exists or belongs to a foreign process.
In the daemon process, drop privileges, if possible and applicable.
From the daemon process, notify the original process started that initialization is complete. This can be implemented via an unnamed pipe or similar communication channel that is created before the first fork() and hence available in both the original and the daemon process.
Call exit() in the original process. The process that invoked the daemon must be able to rely on that this exit() happens after initialization is complete and all external communication channels are established and accessible.
Note this warning:
The BSD daemon() function should not be used, as it implements only a subset of these steps.
A daemon that needs to provide compatibility with SysV systems should implement the scheme pointed out above. However, it is recommended to make this behavior optional and configurable via a command line argument to ease debugging as well as to simplify integration into systems using systemd.
Note that daemon() is not POSIX compliant.
New-Style Daemons
For new-style daemons the following steps are recommended:
If SIGTERM is received, shut down the daemon and exit cleanly.
If SIGHUP is received, reload the configuration files, if this applies.
Provide a correct exit code from the main daemon process, as this is used by the init system to detect service errors and problems. It is recommended to follow the exit code scheme as defined in the LSB recommendations for SysV init scripts.
If possible and applicable, expose the daemon's control interface via the D-Bus IPC system and grab a bus name as last step of initialization.
For integration in systemd, provide a .service unit file that carries information about starting, stopping and otherwise maintaining the daemon. See systemd.service(5) for details.
As much as possible, rely on the init system's functionality to limit the access of the daemon to files, services and other resources, i.e. in the case of systemd, rely on systemd's resource limit control instead of implementing your own, rely on systemd's privilege dropping code instead of implementing it in the daemon, and similar. See systemd.exec(5) for the available controls.
If D-Bus is used, make your daemon bus-activatable by supplying a D-Bus service activation configuration file. This has multiple advantages: your daemon may be started lazily on-demand; it may be started in parallel to other daemons requiring it ā which maximizes parallelization and boot-up speed; your daemon can be restarted on failure without losing any bus requests, as the bus queues requests for activatable services. See below for details.
If your daemon provides services to other local processes or remote clients via a socket, it should be made socket-activatable following the scheme pointed out below. Like D-Bus activation, this enables on-demand starting of services as well as it allows improved parallelization of service start-up. Also, for state-less protocols (such as syslog, DNS), a daemon implementing socket-based activation can be restarted without losing a single request. See below for details.
If applicable, a daemon should notify the init system about startup completion or status updates via the sd_notify(3) interface.
Instead of using the syslog() call to log directly to the system syslog service, a new-style daemon may choose to simply log to standard error via fprintf(), which is then forwarded to syslog by the init system. If log levels are necessary, these can be encoded by prefixing individual log lines with strings like "<4>" (for log level 4 "WARNING" in the syslog priority scheme), following a similar style as the Linux kernel's printk() level system. For details, see sd-daemon(3) and systemd.exec(5).
To learn more read whole man 7 daemon.
You cannot create a process in linux that cannot be killed. The root user (uid=0) can send a signal to a process, and there are two signals which cannot be caught, SIGKILL=9, SIGSTOP=19. And other signals (when uncaught) can also result in process termination.
You may want a more general daemonize function, where you can specify a name for your program/daemon, and a path to run your program (perhaps "/" or "/tmp"). You may also want to provide file(s) for stderr and stdout (and possibly a control path using stdin).
Here are the necessary includes:
#include <stdio.h> //printf(3)
#include <stdlib.h> //exit(3)
#include <unistd.h> //fork(3), chdir(3), sysconf(3)
#include <signal.h> //signal(3)
#include <sys/stat.h> //umask(3)
#include <syslog.h> //syslog(3), openlog(3), closelog(3)
And here is a more general function,
int
daemonize(char* name, char* path, char* outfile, char* errfile, char* infile )
{
if(!path) { path="/"; }
if(!name) { name="medaemon"; }
if(!infile) { infile="/dev/null"; }
if(!outfile) { outfile="/dev/null"; }
if(!errfile) { errfile="/dev/null"; }
//printf("%s %s %s %s\n",name,path,outfile,infile);
pid_t child;
//fork, detach from process group leader
if( (child=fork())<0 ) { //failed fork
fprintf(stderr,"error: failed fork\n");
exit(EXIT_FAILURE);
}
if (child>0) { //parent
exit(EXIT_SUCCESS);
}
if( setsid()<0 ) { //failed to become session leader
fprintf(stderr,"error: failed setsid\n");
exit(EXIT_FAILURE);
}
//catch/ignore signals
signal(SIGCHLD,SIG_IGN);
signal(SIGHUP,SIG_IGN);
//fork second time
if ( (child=fork())<0) { //failed fork
fprintf(stderr,"error: failed fork\n");
exit(EXIT_FAILURE);
}
if( child>0 ) { //parent
exit(EXIT_SUCCESS);
}
//new file permissions
umask(0);
//change to path directory
chdir(path);
//Close all open file descriptors
int fd;
for( fd=sysconf(_SC_OPEN_MAX); fd>0; --fd )
{
close(fd);
}
//reopen stdin, stdout, stderr
stdin=fopen(infile,"r"); //fd=0
stdout=fopen(outfile,"w+"); //fd=1
stderr=fopen(errfile,"w+"); //fd=2
//open syslog
openlog(name,LOG_PID,LOG_DAEMON);
return(0);
}
Here is a sample program, which becomes a daemon, hangs around, and then leaves.
int
main()
{
int res;
int ttl=120;
int delay=5;
if( (res=daemonize("mydaemon","/tmp",NULL,NULL,NULL)) != 0 ) {
fprintf(stderr,"error: daemonize failed\n");
exit(EXIT_FAILURE);
}
while( ttl>0 ) {
//daemon code here
syslog(LOG_NOTICE,"daemon ttl %d",ttl);
sleep(delay);
ttl-=delay;
}
syslog(LOG_NOTICE,"daemon ttl expired");
closelog();
return(EXIT_SUCCESS);
}
Note that SIG_IGN indicates to catch and ignore the signal. You could build a signal handler that can log signal receipt, and set flags (such as a flag to indicate graceful shutdown).
Try using the daemon function:
#include <unistd.h>
int daemon(int nochdir, int noclose);
From the man page:
The daemon() function is for programs wishing to detach themselves
from the controlling terminal and run in the background as system
daemons.
If nochdir is zero, daemon() changes the calling process's current
working directory to the root directory ("/"); otherwise, the current
working directory is left unchanged.
If noclose is zero, daemon() redirects standard input, standard
output and standard error to /dev/null; otherwise, no changes are
made to these file descriptors.
I can stop at the first requirement "A daemon which cannot be stopped ..."
Not possible my friend; however, you can achieve the same with a much better tool, a kernel module.
http://www.infoq.com/articles/inotify-linux-file-system-event-monitoring
All daemons can be stopped. Some are more easily stopped than others. Even a daemon pair with the partner in hold down, respawning the partner if lost, can be stopped. You just have to work a little harder at it.
If your app is one of:
{
".sh": "bash",
".py": "python",
".rb": "ruby",
".coffee" : "coffee",
".php": "php",
".pl" : "perl",
".js" : "node"
}
and you don't mind a NodeJS dependency then install NodeJS and then:
npm install -g pm2
pm2 start yourapp.yourext --name "fred" # where .yourext is one of the above
pm2 start yourapp.yourext -i 0 --name "fred" # run your app on all cores
pm2 list
To keep all apps running on reboot (and daemonise pm2):
pm2 startup
pm2 save
Now you can:
service pm2 stop|restart|start|status
(also easily allows you to watch for code changes in your app directory and auto restart the app process when a code change happens)
Daemon Template
I wrote a daemon template following the new-style daemon: link
You can find the entire template code on GitHub: here
Main.cpp
// This function will be called when the daemon receive a SIGHUP signal.
void reload() {
LOG_INFO("Reload function called.");
}
int main(int argc, char **argv) {
// The Daemon class is a singleton to avoid be instantiate more than once
Daemon& daemon = Daemon::instance();
// Set the reload function to be called in case of receiving a SIGHUP signal
daemon.setReloadFunction(reload);
// Daemon main loop
int count = 0;
while(daemon.IsRunning()) {
LOG_DEBUG("Count: ", count++);
std::this_thread::sleep_for(std::chrono::seconds(1));
}
LOG_INFO("The daemon process ended gracefully.");
}
Daemon.hpp
class Daemon {
public:
static Daemon& instance() {
static Daemon instance;
return instance;
}
void setReloadFunction(std::function<void()> func);
bool IsRunning();
private:
std::function<void()> m_reloadFunc;
bool m_isRunning;
bool m_reload;
Daemon();
Daemon(Daemon const&) = delete;
void operator=(Daemon const&) = delete;
void Reload();
static void signalHandler(int signal);
};
Daemon.cpp
Daemon::Daemon() {
m_isRunning = true;
m_reload = false;
signal(SIGINT, Daemon::signalHandler);
signal(SIGTERM, Daemon::signalHandler);
signal(SIGHUP, Daemon::signalHandler);
}
void Daemon::setReloadFunction(std::function<void()> func) {
m_reloadFunc = func;
}
bool Daemon::IsRunning() {
if (m_reload) {
m_reload = false;
m_reloadFunc();
}
return m_isRunning;
}
void Daemon::signalHandler(int signal) {
LOG_INFO("Interrup signal number [", signal,"] recived.");
switch(signal) {
case SIGINT:
case SIGTERM: {
Daemon::instance().m_isRunning = false;
break;
}
case SIGHUP: {
Daemon::instance().m_reload = true;
break;
}
}
}
daemon-template.service
[Unit]
Description=Simple daemon template
After=network.taget
[Service]
Type=simple
ExecStart=/usr/bin/daemon-template --conf_file /etc/daemon-template/daemon-tenplate.conf
ExecReload=/bin/kill -HUP $MAINPID
User=root
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=daemon-template
[Install]
WantedBy=multi-user.target
By calling fork() you've created a child process. If the fork is successful (fork returned a non-zero PID) execution will continue from this point from within the child process. In this case we want to gracefully exit the parent process and then continue our work in the child process.
Maybe this will help:
http://www.netzmafia.de/skripten/unix/linux-daemon-howto.html
A daemon is just a process in the background. If you want to start your program when the OS boots, on linux, you add your start command to /etc/rc.d/rc.local (run after all other scripts) or /etc/startup.sh
On windows, you make a service, register the service, and then set it to start automatically at boot in administration -> services panel.
I have the following code and I make ps aux | grep myprogram in each step of the main() code of myprogram (name of the application I build).
At the beggining of the execution of myprogram, the ps aux | grep myprogram show only 1 time the myprogram in the list
after cancelling a thread that I created in the begging of the main(), the ps aux | grep myprogram show the myprogram twice and I expected to get only 1.
Could some one explain this behaviour? and how to return to the initial situation (only 1 myprogram)
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
pthread_t test_thread;
void *thread_test_run (void *v)
{
int i=1;
while(1)
{
printf("into thread %d\r\n",i);
i++;
sleep(1);
}
return NULL
}
int main()
{
// ps aux | grep myprogram ---> show only 1 myprogram
pthread_create(&test_thread, NULL, &thread_test_run, NULL);
// ps aux | grep myprogram ---> show 3 myprogram
sleep (20);
pthread_cancel(test_thread);
// ps aux | grep myprogram ---> show 2 myprogram and I expected only 1 !!??
// other function are called here...
return 0;
}
EDIT
the libc used by the linux is libc-0.9.30.1.so
# ls -l /lib/| grep libc
-rwxr-xr-x 1 root root 16390 Jul 11 14:04 ld-uClibc-0.9.30.1.so
lrwxrwxrwx 1 root root 21 Jul 30 10:16 ld-uClibc.so.0 -> ld-uClibc-0.9.30.1.so
lrwxrwxrwx 1 root root 21 Jul 30 10:16 libc.so.0 -> libuClibc-0.9.30.1.so
-rw-r--r-- 1 root root 8218 Jul 11 14:04 libcrypt-0.9.30.1.so
lrwxrwxrwx 1 root root 20 Jul 30 10:16 libcrypt.so.0 -> libcrypt-0.9.30.1.so
-rw-r--r-- 1 root root 291983 Jul 11 14:04 libuClibc-0.9.30.1.so
I'll assume you have some outdated glibc (version 2.2 or 2.3), which used the "linuxthreads" implementation of pthread.
In this older library one additional thread is created by the library for thread management; it can be created after first call to pthread_create; but it will sleep most time.
In newer linuxes there is glibc with NPTL ("Native posix thread library") implementation. When it is used, you will not see threads in ps axu; use ps axum (with m) to see native threads. And NPTL not uses a management thread.
PS Check http://pauillac.inria.fr/~xleroy/linuxthreads/faq.html D.5 answer:
D.5: When I'm running a program that creates N threads, top or ps display N+2 processes that are running my program. What do all these processes correspond to?
Due to the general "one process per thread" model, there's one process for the initial thread and N processes for the threads it created using pthread_create. That leaves one process unaccounted for. That extra process corresponds to the "thread manager" thread, a thread created internally by LinuxThreads to handle thread creation and thread termination. This extra thread is asleep most of the time.
PPS: Thanks, Mohamed KALLEL; thanks, mux: libc-0.9.30.1 is uClibc and seems that it uses same outdated linuxthreads implementation (which is known to be not fully posix-compatible). Here is changelog: http://web.archive.org/web/20070609171609/http://www.uclibc.org/downloads/Changelog
0.9.10 21 March 2002
Major new features:
o pthreads support (derived from glibc 2.1.3's linuxthreads library)
by Stefan Soucek and Erik Andersen
You have an (very) old linux system, where your threads are shown as processes by tools such as ps, which will not happen on newer linux systems
However, a thread is normally not entierly disposed of when it returns or you kill/cancel it.
For that to happen you have to call pthread_detach(pthread_self()) in your thread or create the thread in detached state:
A detached thread will dispose it's thread resources when that thread ends, and a detached thread cannot be joined by pthread_join() later.
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, 1);
pthread_create(&test_thread, &attr, &thread_test_run, NULL);
Or you have to call pthread_join() on it, e.g.
pthread_create(&test_thread, NULL, &thread_test_run, NULL);
sleep (20);
pthread_cancel(test_thread);
pthread_join(test_thread);
pthread_join() will ensure the thread resources are released.
The concept is roughly analogous to zombie processes, where a process is not entierly disposed of until a parent calls wait() or similar on that process.
Could be an older pthreads implementation as suggested by the other answers, however, if you are using a glibc 2.4 or higher which uses NPTL, then note that the ps command you used won't even show threads, instead use:
ps -AL
The output before and after pthread_cancel() is:
$ ps -AL | grep tst
983 983 pts/2 00:00:00 tst
983 984 pts/2 00:00:00 tst
$ ps -AL | grep tst
983 983 pts/2 00:00:00 tst
There's more information about it here specifically about uclibc since it doesn't have NPTL, then ps aux should show all the threads.
Since the introduction of the Native POSIX Threads Library (NPTL)
threads have become rather elusive. They don't show up in the
default process listing using the ps command if you do have the
luxury of the full ps command (from the procps package) on your target
system then all you need to do is add the ā-Lā option.
I am afraid I am not sure what I'm doing wrong here.
I have a threaded application that starts 3 threads upon start
[root#Embest /]# ps
1111 root 608 S fw634c_d_cdm_sb
1112 root 608 S fw634c_d_cdm_sb
1113 root 608 S fw634c_d_cdm_sb
then waits in standby mode for commands from the serial.
after it runs and returns to stand by mode, I check with ps whats going on; there are zombiefied instances of the application (and the file name is sq.bracketed too)
1114 root Z [fw634c_d_cdm_sb]
...
...
...
1768 root Z [fw634c_d_cdm_sb]
about 628 of them.
thing is,
the policy i'm following is:
-for detachable threads - don't care (they will exit and free resources on their own after completing)
-for joinable threads - i run pthread_join after running pthread_create and wait for the threaded function to complete. like this:
if (pthread_create(&tmp_thrd_id,&attr_joinable,run_function,(void *)&aStruct)!=0){
DEBUG(printf("thread NOT created \n"));
}else{
DEBUG(printf("thread created !\n"));
if (pthread_join(tmp_thrd_id,NULL)!=0){
DEBUG(printf("\nERROR in joining \n"));
}else{
DEBUG(printf("Thread completed\n"));
}
}
I only run pthread_exit(NULL) in main , which doesn't do much and after the startup just lies around just because it must not be killed.
i'm probably forgeting something vital here. but can't clarify what after reading a few basic guides on threads....
thank you for your help
A "zombie" thread is a thread that has exited, and is waiting around for someone to call pthread_join to collect its exit status. So somewhere in your program you are creating threads and not eventually calling pthread_join or pthread_detach for those threads.
I want to run program as daemon in remote machine in Unix. I have rsh connection and I want the program to be running after disconnection.
Suppose I have two programs: util.cpp and forker.cpp.
util.cpp is some utility, for our purpose let it be just infinite root.
util.cpp
int main() {
while (true) {};
return 0;
}
forker.cpp takes some program and run it in separe process through fork() and execve():
forker.cpp
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char** argv) {
if (argc != 2) {
printf("./a.out <program_to_fork>\n");
exit(1);
}
pid_t pid;
if ((pid = fork()) < 0) {
perror("fork error.");
exit(1);
} else if (!pid) {
// Child.
if (execve(argv[1], &(argv[1]), NULL) == -1) {
perror("execve error.");
exit(1);
}
} else {
// Parent: do nothing.
}
return 0;
}
If I run:
./forker util
forker is finished very quickly, and bash 'is not paused', and util is running as daemon.
But if I run:
scp forker remote_server://some_path/
scp program remote_server://some_path/
rsh remote_server 'cd /some_path; ./forker program'
then it is all the same (i.e. at the remote_sever forker is finishing quickly, util is running) but my bash in local machine is paused.
It is waiting for util stopping (I checked it. If util.cpp is returning than it is ok.), but I don't understand why?!
There are two questions:
1) Why is it paused when I run it through rsh?
I am sure that I chose some stupid way to run daemon. So
2) How to run some program as daemon in C/C++ in unix-like platforms.
Tnx!
1) Why is it paused when I run it through rsh?
When you fork a process, the child process has its own copy of the parent's file descriptors. Each of the child's file descriptors refers to the same open file description with the corresponding file descriptor of the parent. After you call fork() you are not closing the standard streams (stdin, stdout, stderr) in the child process before your call to execve() so they are still connected to rsh. It may be the case that rsh will not return as long as any process on the remote server is holding a reference to these streams. You could try closing the standard streams using fclose() before your call to execve() or redirect them when you execute your forker program (i.e. ./forker program >/dev/null 2>/dev/null </dev/null).
2) How to run some program as daemon in C/C++ in unix-like platforms.
According to wikipedia, nohup is most often used to run commands in the background as daemons. There are also several daemon related questions on this site you can refer to for information.
From wikipedia:
nohup is a POSIX command to ignore the HUP (hangup) signal, enabling the command to keep running after the user who issues the command has logged out. The HUP (hangup) signal is by convention the way a terminal warns depending processes of logout.
If your program will always run as a daemon, you can look into the possibility of calling daemon() from within your program. The daemon() convenience function exists in some UNIX systems.
From the daemon(3) man page:
The daemon() function is for programs wishing to detach themselves from the controlling terminal and run in the background as system daemons.
Should this function not exist for you or should there be instances where your program does not run as a daemon, your forker program can also be modified to 'daemonize' your other program.
Without making any changes to your code, you could try something like the following:
rsh remote_server 'cd /some_path; nohup ./forker program >program.out 2>program.err </dev/null &'