I intercepted pthread_create call to capture the relations among all the threads. But I found that some threads' creation were not recorded with only intercepting pthread_create. I also tried to intercept posix_spawn posix_spawnp and clone call. But there are still some threads that I don't know who create them running in my experiment. So are there any other ways to create threads on linux?
More specifically,
I used LD_PRELOAD to intercept pthread_create call, the code fragment is shown below:
int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine)(void *), void *arg){
static void *handle = NULL;
static P_CREATE old_create=NULL;
if( !handle )
{
handle = dlopen("libpthread.so.0", RTLD_LAZY);
old_create = (P_CREATE)dlsym(handle, "pthread_create");
}
pthread_t tmp=pthread_self();
//print pthread_t pid
int result=old_create(thread,attr,start_routine,(void *)temp);
//print thread pid
return result;
}
In this way, I captured all the thread creation process. The same goes for clone. But actually clone was not called by the application. Sometimes, I got a parent-child threads pair which the parent thread is not printed before. So I don't know whether there are other ways to create this parent thread.
More more specifically, the upper application is a Mapreduce job on JVM1.7. I want to observe all the threads and processes and their relation
Thank you.
(moving from the comment)
LD_PRELOAD tricks just let you intercept C calls to external libraries - in this particular case to lptrhead (for pthread_create) and to libc (for fork/clone); but to create threads a program can bypass them completely and talk straight to the kernel, by invoking such syscalls (clone in particular) using int 80h (on x86) or sysenter (on amd64).
Straight syscalls cannot be intercepted that easily, you generally need the help of the kernel itself - which generally happens through the ptrace interface - which incidentally is how stuff like strace and debuggers are implemented. You should look in particular at the PTRACE_O_TRACECLONE, PTRACE_O_TRACEVFORK and PTRACE_O_TRACEFORK options to trace the creation of new processes/threads, or straight PTRACE_SYSCALL to block on all syscalls.
The setup of ptrace is a bit laborious and I don't have much time now, but there are several examples on the Internet of the basic pthread loop that you'll surely be able to find/adapt to your objective.
Related
Let's say we have a multi-threaded Linux x86-64 executable (written in C, for example) with three threads: main, consumer and producer. Some of the functions are intended to use by some threads only. For example, the produce() function should only ever be called by the producer thread. I would like that if another thread (such as consumer) calls produce(), then we'd get a fatal error (a SIGABRT or SIGSEGV, for example).
One way to deal with this is to register the thread id, and check that the thread id calling produce() is in fact the producer thread id. If not, call abort(). That method unfortunately requires a runtime check for each function call, that maybe prohibitive if the function is in a hot path.
I'm wondering if there's another way, such as annotating and then moving all functions intended for producer only to their own section and remove executable memory accesses for all the other threads - my understanding is that this wouldn't work since mprotect() sets process-wide permissions - ?
Edit:
#AlanAu asks whether this check has to be done at runtime. It's not a requirement, but my understanding is that such a check would only work at runtime for non-trivial programs using functions pointers, for example.
Edit2:
I realize using processes would help address this, but as noted in the comments inter-threads communications is more efficient.
One /rather hackey/ way of doing this is to make pointers for these and calling the pointers instead of the functions themselves. Example:
void disallowed_call(void)
{ abort(); }
void testfunc(void)
{
printf("Hello, world!\n");
}
void childcode(void (*notrestricted)(void), void (*restricted)(void);)
{
printf("Non restricted call:\n");
*notrestricted();
printf("Restricted call:\n");
*restricted();
}
int main()
{
fork();
if (getpid() == 0)
{
childcode(&testfunc, &testfunc);
}
else
{
childcode(&testfunc, &disallowed_call);
}
return 0;
}
That might be a bit more complicated than you were looking for, but it should work. The runtime check is done only once.
I have a multi-thread application where each thread has a helper thread that helps the first one to accomplish a task. I would like that when a thread is terminated (likely calling exit) the helper thread is terminated as well.
I know that there is the possibility to use exit_group, but this system call kills all threads in the same group of the calling thread. For example, if my application has 10 threads (and therefore 10 additional helper threads) I would like that only the thread and the helper thread associated is terminated, while the other threads keep on running.
My application works exclusively on Linux.
How can I have this behavior?
Reading around about multithreading I got a bit confused about the concept of thread group and process group in Linux. Are these terms referring to the same thing?
Precisely, the process group (and perhaps the thread group) is the pid retrieved by one of the following calls :
pid_t getpgid(pid_t pid);
pid_t getpgrp(void); /* POSIX.1 version */
pid_t getpgrp(pid_t pid); /* BSD version */
You are a bit adrift here. Forget exit_group, which these days is the same as exit on linux is not what you are looking for. Similarly the various get-pid calls aren't really what you want either.
The simplest (and usually best) way to handle this is have each primary thread signal its helper thread to shut down and then pthread_join it - or not if it is detached.
So something like:
(a) primary work thread knows - however it knows - its work is done.
(b) signals helper thread via a shared switch or similar mechanism
(c) helper thread periodically checks flag, cleans up and calls pthread_exit
(d) primary worker thread calls pthread_join (or not) on dead helper thread
(e) primary worker cleans up and calls pthread_exit on itself.
There are a lot of variations on that but that's the basic idea. Beyond that you get into things like pthread_cancel and areas you may want to avoid if you don't absolutely require them (and the potential headaches).
I have a function that is called millions of times, and the work done by this function is multithreaded. Here is the function:
void functionCalledSoManyTimes()
{
for (int i = 0; i < NUM_OF_THREADS; i++)
{
pthread_create(&threads[i], &attr, thread_work_function, (void *)&thread_data[i]);
}
// wait
}
I'm creating the threads each time the function is called, and I give each thread its data struct (that's been set once at the beginning of the algorithm) to use in the thread_work_function. The thread_work_functionsimply processes a series of arrays, and the thread_data struct contains pointers to those arrays and the indices that each thread is responsible for.
Although multithreading the algorithm in this way did improve the performance by more than 20%, my profiling shows that the repetitive calls to pthread_create are causing a significant overhead.
My question is: Is there a way to achieve my goal without calling pthread_create each time the function is called?
Problem Solved.
Thank you guys, I really appreciate your help! I've written a solution here using your tips.
Just start a fixed set of threads and use an inter-thread communication system (ring buffer, for instance) to pass the data to process.
Solving the problem gracefully is not so easy. You can use static storage for a thread pool, but then what happens if functionCalledSoManyTimes itself can be called from multiple threads? It's not a good design.
What I would do to handle this sort of situation is create a thread-local storage key with pthread_key_create on the first call (using pthread_once), and store your thread-pool there with pthread_setspecific the first time functionCalledSoManyTimes gets called in a given thread. You can provide a destructor function to pthread_key_create which will get called when the the thread exists, and this function can then be responsible for signaling the worker threads in the thread pool to terminate themselves (via pthread_cancel or some other mechanism).
Is it possible to read the registers or thread local variables of another thread directly, that is, without requiring to go to the kernel? What is the best way to do so?
You can't read the registers, which wouldn't be useful anyway. But reading thread local variables from another thread is easily possible.
Depending on the architecture (e. g. strong memory ordering like on x86_64) you can safely do it even without synchronization, provided that the read value doesn't affect in any way the thread is belongs to. A scenario would be displaying a thread local counter or similar.
Specifically in linux on x86_64 as you tagged, you could to it like that:
// A thread local variable. GCC extension, but since C++11 actually part of C++
__thread int some_tl_var;
// The pointer to thread local. In itself NOT thread local, as it will be
// read from the outside world.
struct thread_data {
int *psome_tl_var;
...
};
// the function started by pthread_create. THe pointer needs to be initialized
// here, and NOT when the storage for the objects used by the thread is allocated
// (otherwise it would point to the thread local of the controlling thread)
void thread_run(void* pdata) {
pdata->psome_tl_var = &some_tl_var;
// Now do some work...
// ...
}
void start_threads() {
...
thread_data other_thread_data[NTHREADS];
for (int i=0; i<NTHREADS; ++i) {
pthread_create(pthreadid, NULL, thread_run, &other_thread_data[i]);
}
// Now you can access each some_tl_var as
int value = *(other_thread_data[i].psome_tl_var);
...
}
I used similar for displaying some statistics about worker threads. It is even easier in C++, if you create objects around your threads, just make the pointer to the thread local a field in your thread class and access is with a member function.
Disclaimer: This is non portable, but it works on x86_64, linux, gcc and may work on other platforms too.
There's no way to do it without involving the kernel, and in fact I don't think it could be meaningful to read them anyway without some sort of synchronization. If you don't want to use ptrace (which is ugly and non-portable) you could instead choose one of the realtime signals to use for a "send me your registers/TLS" message. The rough idea is:
Lock a global mutex for the request.
Store the information on what data you want (e.g. a pthread_key_t or a special value meaning registers) from the thread in global variables.
Signal the target thread with pthread_kill.
In the signal handler (which should have been installed with sigaction and SA_SIGINFO) use the third void * argument to the signal handler (which really points to a ucontext_t) to copy that ucontext_t to the global variable used to communicate back to the requesting thread. This will give it all the register values, and a lot more. Note that TLS is a bit more tricky since pthread_getspecific is not async-signal-safe and technically not legal to run in this context...but it probably works in practice.
The signal handler posts a semaphore (this is the ONLY async-signal-safe synchronization function offered by POSIX) indicating to the requesting thread that it's done, and returns.
The requesting thread finishes by waiting on the semaphore, then reads the data and unlocks the request mutex.
Note that this will involve at least 1 transition to kernelspace (pthread_kill) in the requesting thread (and maybe another in sem_wait), and 1-3 in the target thread (1 for returning from the signal handler, one for entering the signal handler if it was not already sleeping in kernelspace, and possibly one for sem_post). Still it's probably faster than mucking around with ptrace which is not designed for high-performance usage...
I am just started coding of device driver and new to threading, went through many documents for getting an idea about threads. I still have some doubts.
what is a kernel thread?
how it differs from user thread?
what is the relationship between the two threads?
how can i implement kernel threads?
where can i see the output of the implementation?
Can anyone help me?
Thanks.
A kernel thread is a task_struct with no userspace components.
Besides the lack of userspace, it has different ancestors (kthreadd kernel thread instead of the init process) and is created by a kernel-only API instead of sequences of clone from fork/exec system calls.
Two kernel threads have kthreadd as a parent. Apart from that, kernel threads enjoy the same "independence" one from another as userspace processes.
Use the kthread_run function/macro from the kthread.h header You will most probably have to write a kernel module in order to call this function, so you should take a look a the Linux Device Drivers
If you are referring to the text output of your implementation (via printk calls), you can see this output in the kernel log using the dmesg command.
A kernel thread is a kernel task running only in kernel mode; it usually has not been created by fork() or clone() system calls. An example is kworker or kswapd.
You probably should not implement kernel threads if you don't know what they are.
Google gives many pages about kernel threads, e.g. Frey's page.
user threads & stack:
Each thread has its own stack so that it can use its own local variables, thread’s share global variables which are part of .data or .bss sections of linux executable.
Since threads share global variables i.e we use synchronization mechanisms like mutex when we want to access/modify global variables in multi threaded application. Local variables are part of thread individual stack, so no need of any synchronization.
Kernel threads
Kernel threads have emerged from the need to run kernel code in process context. Kernel threads are the basis of the workqueue mechanism. Essentially, a thread kernel is a thread that only runs in kernel mode and has no user address space or other user attributes.
To create a thread kernel, use kthread_create():
#include <linux/kthread.h>
structure task_struct *kthread_create(int (*threadfn)(void *data),
void *data, const char namefmt[], ...);
kernel threads & stack:
Kernel threads are used to do post processing tasks for kernel like pdf flush threads, workq threads etc.
Kernel threads are basically new process only without address space(can be created using clone() call with required flags), means they can’t switch to user-space. kernel threads are schedulable and preempt-able as normal processes.
kernel threads have their own stacks, which they use to manage local info.
More about kernel stacks:-
https://www.kernel.org/doc/Documentation/x86/kernel-stacks
Since you're comparing kernel threads with user[land] threads, I assume you mean something like the following.
The normal way of implementing threads nowadays is to do it in the kernel, so those can be considered "normal" threads. It's however also possible to do it in userland, using signals such as SIGALRM, whose handler will save the current process state (registers, mostly) and change them to another one previously saved. Several OSes used this as a way to implement threads before they got proper kernel thread support. They can be faster, since you don't have to go into kernel mode, but in practice they've faded away.
There's also cooperative userland threads, where one thread runs until it calls a special function (usually called yield), which then switches to another thread in a similar way as with SIGALRM above. The advantage here is that the program is in total control, which can be useful when you have timing concerns (a game for example). You also don't have to care much about thread safety. The big disadvantage is that only one thread can run at a time, and therefore this method is also uncommon now that processors have multiple cores.
Kernel threads are implemented in the kernel. Perhaps you meant how to use them? The most common way is to call pthread_create.