tracing pthreads in linux? - c

I did not find any tool created for tracing pthread's threads in linux process. I want something like strace/ltrace, is there something to view calls in real-time?
Thank you

strace works for threads as well. Use strace -f to strace all threads.
To strace only a particular thread, you first have to find its tid (thread id).
Threads have thread id's that's really a pid (process id)
Once you know the pid of the thread, use strace -p the_pid to strace that thread.
The pids of all the threads in a process can be found in /proc/<pid>/task/ , or the current thread id can be learned with the gettid() C call.

actually strace is not as good as perf .
use perf tool , you can get more information.
for example, if some of your threads hangs , and you want to find out what functions calls that hangs,
use strace -p pid-id returns limited information, but perf top , or perf -t tid returns more

Related

waitpid() function returns ERROR (-1), why?

I'm writing a Linux shell-like program in C.
Among others, I'm implementing two built-in commands: jobs, history.
In jobs, I print the list of currently working commands (in the background).
In history I print the list of all commands history until now, specifying for each command if it's RUNNING or DONE.
To implement the two, my idea was to have a list of commands, mapping the command name to their PID. Once the jobs/history command is called, I run through them, check which ones are running or done, and print accordingly.
I read online that the function: waitpid(pid, &status, WNOHANG), can detect from "PID" whether a process is still running or done, without stopping the process.
It works well, except for this:
When a program is alive, the function returns it.
When a program is done, the first time I call it returns done, and from there on, if called again with the same PID, it returns -1 (ERROR).
For example, it would look like this: (the & symbolizes background command)
$ sleep 3 &
$ jobs
sleep ALIVE
$ jobs (withing the 3 seconds)
sleep ALIVE
$ jobs (after 3 seconds)
sleep DONE
$ jobs
sleep ERROR
$ jobs
sleep ERROR
....
Also, these are not influenced by other command calls I might do before or after, it seems the behavior described above is independent of other commands.
I read online various reasons why waitpid might return -1, but I wasn't able to identify the reason in my case. Also, I tried looking for how to understand what type of waitpid error is it, but again unsuccessfully.
My questions are:
Why do you think this behavior is happening
If you have a solution (the ideal thing would it for it to keep returning DONE)
If you have a better idea of how to implement the jobs/history command is well accepted
One solution for this problem is that as soon as I get "DONE", I sign the command as DONE, and don't perform the waitid anymore on it before printing it. This would solve the issue, but I would remain in the dark as to WHY is this happening
You should familiarize yourself with how child processes are handled on Unix environments. In particular read about Zombie processes.
When a process dies, it enters a 'zombie' state, so that its PID is still reserved and uniquely identifies the now-dead process. A successful wait on a zombie process frees up the process descriptor and its PID. Consequently subsequent calls to wait on the same PID will fail cause there's no more process with that PID (unless a new process is allocated the same PID, in which case waiting on it would be a logical error).
You should restructure your program so that if a wait is successful and reports that a process is DONE, you record that information in your own data structure and never call wait on that PID again.
For comparison, once a process is done, bourne shell reports it one last time and then removes it from the list of jobs:
$ sleep 10 &
$ jobs
[1] + Running sleep 10
$ jobs
[1] + Running sleep 10
$ jobs
[1] Done sleep 10
$ jobs
$

Kill the program launched with system() in child thread

I have main program from which I create two threads using pthread_create(). In one thread, I call
Thread I
{
...
system ("binary application");
}
System() internally forks a child process. How can I kill that " binary application" from main program??
That's not directly supported.
You need the PID to kill a process, and system() is designed for the synchronous execution of some command — it doesn't expose the PID of the invoked command. Indeed, system() might spawn several PIDs, several generations of descendants, probably /bin/sh and then your binary-application.
How would you kill the binary-application from an external process (not a thread, a completely external process)? However you'd do that might be how your killing thread can get the PID.
It's probably easier to set an alarm on the command, or instead call fork() (which gives you the PID) and exec() in your own code. In any case, system() in a multithreaded program can be tricky, so take care.

List all threads

How can I list all threads within the current process in FreeBSD? Or at least, get the number of threads running.
I found the Linux system call pstat_getproc which returns a struct containing pst_nlwps, the number of threads. I am looking for something similar to this on FreeBSD.
Or perhaps there is something like /dev/fd but for threads.
Anything I can use to get some kind of idea about how many other threads are running.
I want to be able to do this programmatically in C, not using an existing application.
Use procstat(1), eg
# procstat -t $(pgrep openvpn)
PID TID COMM TDNAME CPU PRI STATE WCHAN
537 100051 openvpn - 0 120 sleep select
which depends on libprocstat(3).

Relation between Thread ID and Process ID

I am having some confusion between Process Id and Thread Id. I have gone through several web-post including stack overflow here, Which says
starting a new process gives you a new PID and a new TGID, while starting a new thread gives you a new PID while maintaining the same TGID.
So when I run a program why all the threads created from the program don't have different PID?
I know in programming we usually say that the main is a thread and execution starts from main , So if I create multiple thread from main, all the threads will have the same PID which is equal to the main's PID.
So what I wanted to ask is as below:
1) When we run a program it will run as a process or a thread?
2) Is there any difference between main thread creating threads and Process creating threads?
3) Is there any difference between thread and process in linux? Since I read somewhere that linux doesn't differentiate between Thread and Process.
Simplifying a bit:
The PID is the process ID, TID is the thread ID. The thing is that for the first thread created by fork(), PID=TID. If you create more threads within the process, with a clone() command, then PID and TID will be different, PID will always be smaller than TID.
No, there is no difference, except maybe that if main is killed, all other threads are also killed.
Yes, the thread is what actually gets scheduled. Technically, the process is only a memory mapping of the different segments of code (text, bss, stack, heap and the OS).
This confusion comes from the Linux concept of tasks.
In Linux there is little difference between a task and a thread though.
Every process is a self contained VM running at least one task.
Each task is an independent execution unit within a process scope.
The main task of a process gives it's task id (TID) to the process as it's process id (PID).
Every new thread that you spawn within a process creates a new task within it. In order to identify then individually in the kernel they get assigned their own individual task id (TID).
All tasks within a process share the same task group id (TGID).
I got the answer here on stackoverflow. It states that if we run a program on Linux that contains the libc libuClibc-0.9.30.1.so (1). Basically an older version of libc then thread created will have different PID as shown below
root#OpenWrt:~# ./test
main thread pid is 1151
child thread pid is 1153
and I tried to run this program with a linux that contains the libc from ubuntu libc6 (2) i.e newer version of libc then Thread created will have the same PID as the process.
$ ./test
main thread pid is 2609
child thread pid is 2609
The libc (1) use linuxthreads implementation of pthread
And the libc (2) use NPTL ("Native posix thread library") implementation of pthread
According to the linuxthreads FAQ (in J.3 answer):
each thread is really a distinct process with a distinct PID, and signals sent to the PID of a thread can only be handled by that thread
So in the old libc which use linuxthreads implementation, each thread has its distinct PID
In the new libc version which use NPTL implementation, all threads has the same PID of the main process.
The NPTL was developed by redhat team. and according to the redhat NPTL document: One of the problems which are solved in the NPTL implementation is:
(Chapter: Problems with the Existing Implementation, page5)
Each thread having a different process ID causes compatibility problems with other POSIX thread implementations. This is in part a moot point since signals can't be used very well but is still noticeable
And that explain this issue.
I am using the new libc version that contains the NPTL ("Native posix thread library") implementation of pthread.
The post you have shown describes of Linux threading implementation which I suppose is the older version of Linux implementation where threads were created as a different process.
In the POSIX implementation of threads, the threads are not created as a different process rather they create different streams of parallel execution of the code which have some components differ in the those parallel execution, the information of which is stored by Thread Descriptor storing the TID.
Whereas the process creating multiple thread can be referred as a multi-threaded process, thus has a same PID of all its thread but different TID's. The main process creating thread can be referred as Main thread
You will get same Process ID as all threads are sharing your program data which is your process so when you call for Process ID you get the same.

Why "kill -15" fails sometimes?

I have a program developed in C. This program contains 2 sub threads. Some times, When I try to stop my application with kill -15 <pid of main thread> the application does not exit. And I can see only the pid of the main thread in the ps aux output (The pids of the subthreads are not displayed in the outpout of the ps aux). And keep killing the remaining pid with kill -15 <pid> does not cause the termination of this process. Only kill -9 <pid> will cause the termination of the process.
This behaviour happens 3 times in 1000 tries.
The OS is OpenWRT Linux
The kernel version is 2.6.30
Libs: libuClibc-0.9.30.1.so and libpthread-0.9.30.1.so
Please do not consider this topic duplicated with this one, because my program does not contain sigaction handler.
It's not duplicate, but the answer is the same. Attach strace or gdb and see what it's doing when it's hung. However there are only two explanations: either you (or some library code you're using) blocked SIGTERM with sigprocmask, or the process is stuck in uninterruptable sleep in the kernel, which is usually a result of attempting to access a failing storage device like a dying hard drive or scratched optical disc.
Could you elaborate on what OS, kernel version, libraries, etc. you're using?

Resources