zombiefied threads in ps (for a threaded program written in c) - c

I am afraid I am not sure what I'm doing wrong here.
I have a threaded application that starts 3 threads upon start
[root#Embest /]# ps
1111 root 608 S fw634c_d_cdm_sb
1112 root 608 S fw634c_d_cdm_sb
1113 root 608 S fw634c_d_cdm_sb
then waits in standby mode for commands from the serial.
after it runs and returns to stand by mode, I check with ps whats going on; there are zombiefied instances of the application (and the file name is sq.bracketed too)
1114 root Z [fw634c_d_cdm_sb]
...
...
...
1768 root Z [fw634c_d_cdm_sb]
about 628 of them.
thing is,
the policy i'm following is:
-for detachable threads - don't care (they will exit and free resources on their own after completing)
-for joinable threads - i run pthread_join after running pthread_create and wait for the threaded function to complete. like this:
if (pthread_create(&tmp_thrd_id,&attr_joinable,run_function,(void *)&aStruct)!=0){
DEBUG(printf("thread NOT created \n"));
}else{
DEBUG(printf("thread created !\n"));
if (pthread_join(tmp_thrd_id,NULL)!=0){
DEBUG(printf("\nERROR in joining \n"));
}else{
DEBUG(printf("Thread completed\n"));
}
}
I only run pthread_exit(NULL) in main , which doesn't do much and after the startup just lies around just because it must not be killed.
i'm probably forgeting something vital here. but can't clarify what after reading a few basic guides on threads....
thank you for your help

A "zombie" thread is a thread that has exited, and is waiting around for someone to call pthread_join to collect its exit status. So somewhere in your program you are creating threads and not eventually calling pthread_join or pthread_detach for those threads.

Related

I want a function in C on linux to collect core dump without terminating the process

abort() do collect the core dump, but I don't want the process to terminate. dump_core() collects the core dump, but in kernel space. Is there any function equivalent to dump_core() in user space?
A simple way to do it yourself is to fork the process (which creates a complete copy of the parent process) and call abort from the child process.
The child process will be aborted with a core-dump, while the parent process continues as if nothing happened.
Use gcore.
.
.
.
char command[ 1024 ];
sprintf( command, "gcore -o /core/file/name %d", getpid() );
system( command );
.
.
.
Error and bounds checking are omitted.
There is no such Linux C command. However, you may find some third party tools that can do this for you. For example, Google coredumper, which is also supposed to be able to capture all the threads. Another way would be to attach gdb to your running process, and issue the gcore command. This is essentially what the gcore command line utility does.
Kernel generates SIGSEGV signal to the process whenever coredumps, I think you should attach a handler to the SIGSEGV signal(Link) and call fork from that handler function.

How to configure GDB in Eclipse such that all prcoesses keep on running including the process being debugged?

I am new in C programming and I have been trying hard to customize an opensource tool written in C according to my organizational needs.
IDE: Eclipse,
Debugger: GDB,
OS: RHEL
The tool is multi-process in nature (main process executes first time and spawns several child processes using fork() ) and they share values in run time.
While debugging in Eclipse (using GDB), I find that the process being debugged is only running while other processes are in suspended mode. Thus, the only running process is not able to do its intended job because the other processes are suspended.
I saw somewhere that using MI command in GDB as "set non-stop on" could make other processes running. I used the same command in the gdbinit file shown below:
Note: I have overridden above .gdbinit file with an another gdbinit because the .gdbinit is not letting me to debug child processes as debugger terminates after the execution of main process.
But unfortunately debugger stops responding after using this command.
Please see below commands I am using in the gdbinit file:
Commenting non-stop enables Eclipse to continue usual debugging of the current process.
Adding: You can see in below image that only one process is running while others are suspended.
Can anyone please help me to configure GDB according to my requirement?
Thanks in advance.
OK #n.m.: Actually, You were right. I should have given more time to understand the flow of the code.
The tool creates 3 processes first and then the third process creates 5 threads and keeps on wait() for any child thread to terminate.
Top 5 threads (highlighted in blue) shown in the below image are threads and they are children of Process ID: 17991
The first two processes are intended to initiate basic functionality of the tool and hence they just wait to get exit(0). You can see below.
if (0 != (pid = zbx_fork()))
exit(0);
setsid();
signal(SIGHUP, SIG_IGN);
if (0 != (pid = zbx_fork()))
exit(0);
That was the reason I was not actually able to step in these 3 processes. Whenever, I tried to do so, the whole main process terminated immediately and consequently leaded to terminate all other processes.
So, I learned that I was supposed to "step-into" threads only. And yes, actually I can now debug :)
And this could be achieved because I had to remove the MI command "set follow-fork-mode child". So, I just used the default " .gdbinit" file with enabled "Automatically debug forked process".
Thanks everyone for your input. Stackoverflow is an awesome place to learn and share. :)

Underlying mechanism when pausing a process

I have a program that requires it to pause and resume another program. To do this, I use the kill function, either from code with: -
kill(pid, SIGSTOP); // to pause
kill(pid, SIGCONT); // to resume
Or from the command line with the similar
kill -STOP <pid>
kill -CONT <pid>
I can trace what's going on with the threads using this code, taken from Mac OS X Internals.
If a program is paused and immediately resumed, the state of threads can show as UNINTERRUPTIBLE. Most of the time, they report as WAITING, which is not surprising and if a thread is doing work, it will show as RUNNING.
What I don't understand is when I pause a program and view the states of the threads, they still show as WAITING. I would have expected their state to be either STOPPED or HALTED
Can someone explain why they still show as WAITING and when would they be STOPPED or HALTED. In addition, is there another structure somewhere that shows the state of the program and its threads being halted in this way?
"Waiting" is shown in your case because you did not terminate the program rather paused it, where as Stopped or Halted state usually occurs when the program immediately stopped working due to some runtime error. As far as your second question is concerned, I do not think there is some other structure out there to show the state of the program. Cheers
After researching and experimenting with the available structures, I've discovered that it is possible to show the state of a program being halted by looking at the process suspend count. This is my solution: -
int ProcessIsSuspended(unsigned int pid)
{
kern_return_t kr;
mach_port_t task;
mach_port_t mytask;
mytask = mach_task_self();
kr = task_for_pid(mytask, pid, &task);
// handle error here...
char infoBuf[TASK_INFO_MAX];
struct task_basic_info_64 *tbi;
int infoSize = TASK_INFO_MAX;
kr = task_info(task, TASK_BASIC_INFO_64, (task_info_t)infoBuf, (mach_msg_type_number_t *)&infoSize);
// handle error here....
tbi = (struct task_basic_info_64 *) infoBuf;
if(tbi->suspend_count > 0) // process is suspended
return 1;
return 0;
}
If suspend_count is 0, the program is running, else it is in a paused state, waiting to be resumed.

How do I tell how many threads a Linux binary is creating without source?

Suppose I have a generic binary without source and I want to determine whether it is running serially or spawns multiple threads.
Is there a way I can do this from the linux command line?
First install strace.
$ yum install strace
Run the program with strace, and look for clone or fork system calls. Here's a quick example with a program I wrote that just calls fork and returns.
$ strace ./a.out
execve("./a.out", ["./a.out"], [/* 43 vars */]) = 0
brk(0) = 0x74f000
...
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb22b16da10) = 6567
exit_group(1) = ?
+++ exited with 1 +++
You can use ps for that. From man ps:
-L Show threads, possibly with LWP and NLWP columns.
So you can do:
$ ps -L <pid>
and it will show you something like this:
PID LWP TTY STAT TIME COMMAND
4112 4112 ? Sl 65:35 /usr/lib/firefox/firefox
4112 4116 ? Sl 0:04 /usr/lib/firefox/firefox
Each line of the output corresponds to one thread. This of course, only works for a certain
moment in time. To track the spawning of threads, use strace, as suggested by Jonathon Reinhart.
An alternative to strace is, of course, gdb. See this question for details on managing threads in gdb. You may also read the thread section of the gdb manual. Quick introduction:
$ gdb /usr/lib/firefox/firefox <pid>
[... initialization output ...]
> info threads # lists threads
> thread <nr> # switch to thread <nr>
Your comment:
How can I figure out where to set an instruction-level breakpoint if the program only takes seconds to run?
This answer might help you here, as it shows how to break on thread creation (with pthread_create) using gdb. So every time a thread is created, execution stops and you might investigate.
Just run: cat /proc/<pid>/stat | awk '{print $20}' to get the number of threads of a running process.
proc manpage

Making sure two processes interleave

In a C program on Linux, I fork() followed by execve() twice to create two processes running two seperate programs. How do I make sure that the execution of the two child processes interleave?
Thanks
Tried to do the above task as an answer given below had suggested but seems on encountering sched_scheduler() process hangs. Including code below...replay1 and replay2 are two prograns which simply prints "Replay1" and "Replay2" respectively.
# include<stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <signal.h>
#include <sched.h>
void main()
{
int i,pid[5],pidparent,new=0;
char *newargv1[] = {"./replay1",NULL};
char *newargv2[] = {"./replay2",NULL};
char *newenviron[] = {NULL};
struct sched_param mysched;
mysched.sched_priority = 1;
sched_setscheduler(0,SCHED_FIFO, &mysched);
pidparent =getpid();
for(i=0;i<2;i++)
{
if(getpid()==pidparent)
{
pid[i] = fork();
if(pid[i] != 0)
kill(pid[i],SIGSTOP);
if(i==0 && pid[i]==0)
execve(newargv1[0], newargv1, newenviron);
if (i==1 && pid[i]==0)
execve(newargv2[0], newargv2, newenviron);
}
}
for(i=0;i<10;i++)
{
if(new==0)
new=1;
else
new=0;
kill(pid[new],SIGCONT);
sleep(100);
kill(pid[new], SIGSTOP);
}
}
Since you need random interleaving, here's a horrible hack to do it:
Immediately after forking, send a SIGSTOP to each application.
Set your parent application to have real-time priority with sched_setscheduler. This will allow you to have more fine-grained timers.
Send a SIGCONT to one of the child processes.
Loop: Wait a random, short time. Send a SIGSTOP to the currently-running application, and a SIGCONT to the other. Repeat.
This will help force execution to interleave. It will also make things quite slow. You may also want to try using sched_setaffinity to assign each process to a different CPU (if you have a dual-core or hyperthreaded CPU) - this will cause them to effectively run simultaneously, modulo wait times for I/O. I/O wait times (which could cause them to wait for the hard disk, at which point they're likely to wake up sequentially and thus not interleave) can be avoided by making sure whatever data they're manipulating is on a ramdisk (on linux, use tmpfs).
If this is too coarse-grained for you, you can use ptrace's PTRACE_SINGLESTEP operation to step one CPU operation at a time, interleaving as you see fit.
As this is for testing purposes, you could place sched_yield(); calls after every line of code in the child processes.
Another potential idea is to have a parent process ptrace() the child processes, and use PTRACE_SINGLESTEP to interleave the two process's execution on an instruction-by-instruction basis.
if you need to synchronize them and they are your own processes, use semaphores. If you do not have access to the source, then there is no way to synchronize them.
If your aim is to do concurrency testing, I know of only two techniques:
Test exact scenarios using synchronization. For example, process 1 opens a connection and executes a query, then process 2 comes in and executes a query, then process1 gets active again and gets the results, etc. You do this with synchronization techniques mentioned by others. However, getting good test scenarios is very difficult. I have rarely used this method in the past.
In random you trust: fire up a high number of test processes that execute a long running test suite. I used this method for both multithreading and multiprocess testing (my case was testing device driver access from multiple processes without blue screening out). Usually you want to make the number of processes and number of iterations of the test suite per process configurable so that you can either do a quick pass or do a longer test before a release (running this kind of test with 10 processes for 10-12 hours was not uncommon for us). A usual run for this sort of testing is measured in hours. You just fire up the processes, let them run for a few hours, and hope that they will catch all the timing windows. The interleaving is usually handled by the OS, so you don't really need to worry about it in the test processes.
Job control is much simpler with the Bash instead of C. Try this:
#! /bin/bash
stop ()
{
echo "$1 stopping"
kill -SIGSTOP $2
}
cont ()
{
echo "$1 continuing"
kill -SIGCONT $2
}
replay1 ()
{
while sleep 1 ; do echo "replay 1 running" ; done
}
replay2 ()
{
while sleep 1 ; do echo "replay 2 running" ; done
}
replay1 &
P1=$!
stop "replay 1" $P1
replay2 &
P2=$!
stop "replay 2" $P2
trap "kill $P1;kill $P2" EXIT
while sleep 1 ; do
cont "replay 1 " $P1
cont "replay 2" $P2
sleep 3
stop "replay 1 " $P1
stop "replay 2" $P2
done
The two processes are running in parallel:
$ ./interleave.sh
replay 1 stopping
replay 2 stopping
replay 1 continuing
replay 2 continuing
replay 2 running
replay 1 running
replay 1 running
replay 2 running
replay 1 stopping
replay 2 stopping
replay 1 continuing
replay 2 continuing
replay 1 running
replay 2 running
replay 2 running
replay 1 running
replay 2 running
replay 1 running
replay 1 stopping
replay 2 stopping
replay 1 continuing
replay 2 continuing
replay 1 running
replay 2 running
replay 1 running
replay 2 running
replay 1 running
replay 2 running
replay 1 stopping
replay 2 stopping
^C

Resources