If I have multiple threads and I would like to wait for a thread to finish, why is this undefined according to the pthread_join() function?
For example, the code below shows thread 1 and 2 wait for thread 0:
void* thread1(void* t1){
..
pthread_join(pthread_t thread0, void **retval1);
return NULL
}
void* thread2(void* t2){
..
pthread_join(pthread_t thread0, void **retval2);
return NULL
}
Why would this behaviour be undefined or in other words not possible?
The pthread_t object will generally point to some allocated data pertaining to the thread. Among this data will generally be the return value of the thread. pthread_join will read the return value and free the thread data.
If you pthread_join the same pthread id twice, you kind of have a case of a double-free. The pointed to data might be invalid on the second join, but it also could have been reused in an unpredictable fashion by another thread.
The results are difficult/impossible to reason about. Hence the UB.
Basically, the retval of thread0 is stored until one thread calls pthread_join. After that, the retval is no longer available.
C++ has std::shared_future which explicitly is designed so multiple threads can wait for a single result, which shows your idea isn't strange.
pthread_join is a resource-identifier-freeing operation, like close on file descriptors, free on allocated memory, unlink on filenames, etc. All such operations are inherently subject to "double-free"/"use-after-free" bugs if the identifier (here, the particular pattern of bits making up the pthread_t value) can be used again in the future. As such, the behavior has to be either completely undefined or subject to conditions that make it a serious programming error, to the point where it might as well be undefined.
Note that in your mental model you might be thinking of both pthread_join calls "starting before the thread exits", and thereby taking place during the time when the pthread_t identifier is still valid. However formally there is no ordering between them, nor can there be. There is no observable distinction between the state of "already blocked in pthread_join" and "just about to call pthread_join".
Related
I have a program that has a static pthread_key_t key variable and a function that calls pthread_key_create(&key, &cleanup_function) when the first thread is started. I don't want to call pthread_key_delete in the cleanup routine because it will run whenever a thread exits instead of when all threads have exited. This will cause problems if another thread calls get_specific or set_specific later on.
My question is: Can I just completely leave out pthread_key_delete? Will this (having called pthread_key_create without calling pthread_key_delete afterward) cause any memory leak when the program eventually comes to a halt? Is it mandatory to call pthread_key_delete after creating a pthread_key_t? Or does the key just go into the garbage collector or get destructed once the entire problem ends?
static pthread_key_t key = NULL;
...
static void cleanup(void *value) {
...
if (thread_exit_callback) {
thread_exit_callback(value);
}
free(value);
}
static void *start(void *value) {
...
if (key == NULL){
pthread_key_create(&key, &cleanup);
}
pthread_setspecific(key, value);
...
}
The program looks something like this.
I have a program that has a static pthread_key_t key variable and a function that calls pthread_key_create(&key, &cleanup_function) when the first thread is started.
I presume you mean the second thread, or the first started via pthread_create() Every process has one thread at startup. All code that is executed is executed in a thread.
Also, "when" is a bit fiddly. It is cleanest for the initial thread to create the key, and that before launching any other threads. Otherwise, you need to engage additional synchronization to avoid a data race.
I don't want to call pthread_key_delete in the cleanup routine because it will run whenever a thread exits instead of when all threads have exited.
The documentation refers to your "cleanup function" as a destructor, which term I find to be more indicative of the intended purpose: not generic cleanup, but appropriate teardown for thread-specific values associated with the key. You have nevertheless come to the right conclusion, however. The destructor function, if given, should tear down only a value, not the key.
Can I just completely leave out pthread_key_delete? Will this (having called pthread_key_create without calling pthread_key_delete afterward) cause any memory leak when the program eventually comes to a halt?
That is unspecified, so it is safest to ensure that pthread_key_delete() is called. If all threads using the key are joinable then you can do that after joining them. You could also consider registering an exit handler to do that, but explicitly destroying the key is better style where that can reasonably be performed.
With that said, you will not leak any ordinary memory if you fail to destroy the key, as the system will reclaim all resources belonging to a process when that process exits. The main risk is that some of the resources associated with the key will be more broadly scoped, and that they would leak. Named semaphores and memory-mapped files are examples of such resources, though I have no knowledge of those specific types of resources being associated with TLD keys.
does the key just go into the garbage collector or get destructed once the entire problem ends?
C implementations do not typically implement garbage collection, though there are add-in garbage collector frameworks for C. But as described above, the resources belonging to a process do get released by the system when the process terminates.
For an assignment, I need to use sched_yield() to synchronize threads. I understand a mutex lock/conditional variables would be much more effective, but I am not allowed to use those.
The only functions we are allowed to use are sched_yield(), pthread_create(), and pthread_join(). We cannot use mutexes, locks, semaphores, or any type of shared variable.
I know sched_yield() is supposed to relinquish access to the thread so another thread can run. So it should move the thread it executes on to the back of the running queue.
The code below is supposed to print 'abc' in order and then the newline after all three threads have executed. I looped sched_yield() in functions b() and c() because it wasn't working as I expected, but I'm pretty sure all that is doing is delaying the printing because a function is running so many times, not because sched_yield() is working.
The server it needs to run on has 16 CPUs. I saw somewhere that sched_yield() may immediately assign the thread to a new CPU.
Essentially I'm unsure of how, using only sched_yield(), to synchronize these threads given everything I could find and troubleshoot with online.
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <sched.h>
void* a(void*);
void* b(void*);
void* c(void*);
int main( void ){
pthread_t a_id, b_id, c_id;
pthread_create(&a_id, NULL, a, NULL);
pthread_create(&b_id, NULL, b, NULL);
pthread_create(&c_id, NULL, c, NULL);
pthread_join(a_id, NULL);
pthread_join(b_id, NULL);
pthread_join(c_id, NULL);
printf("\n");
return 0;
}
void* a(void* ret){
printf("a");
return ret;
}
void* b(void* ret){
for(int i = 0; i < 10; i++){
sched_yield();
}
printf("b");
return ret;
}
void* c(void* ret){
for(int i = 0; i < 100; i++){
sched_yield();
}
printf("c");
return ret;
}
There's 4 cases:
a) the scheduler doesn't use multiplexing (e.g. doesn't use "round robin" but uses "highest priority thread that can run does run", or "earliest deadline first", or ...) and sched_yield() does nothing.
b) the scheduler does use multiplexing in theory, but you have more CPUs than threads so the multiplexing doesn't actually happen, and sched_yield() does nothing. Note: With 16 CPUs and 2 threads, this is likely what you'd get for "default scheduling policy" on an OS like Linux - the sched_yield() just does a "Hrm, no other thread I could use this CPU for, so I guess the calling thread can keep using the same CPU!").
c) the scheduler does use multiplexing and there's more threads than CPUs, but to improve performance (avoid task switches) the scheduler designer decided that sched_yield() does nothing.
d) sched_yield() does cause a task switch (yielding the CPU to some other task), but that is not enough to do any kind of synchronization on its own (e.g. you'd need an atomic variable or something for the actual synchronization - maybe like "while( atomic_variable_not_set_by_other_thread ) { sched_yield(); }). Note that with an atomic variable (introduced in C11) it'd work without sched_yield() - the sched_yield() (if it does anything) merely makes busy waiting less awful/wasteful.
Essentially I'm unsure of how, using only sched_yield(), to
synchronize these threads given everything I could find and
troubleshoot with online.
That would be because sched_yield() is not well suited to the task. As I wrote in comments, sched_yield() is about scheduling, not synchronization. There is a relationship between the two, in the sense that synchronization events affect which threads are eligible to run, but that goes in the wrong direction for your needs.
You are probably looking at the problem from the wrong end. You need each of your threads to wait to execute until it is their turn, and for them to do that, they need some mechanism to convey information among them about whose turn it is. There are several alternatives for that, but if "only sched_yield()" is taken to mean that no library functions other than sched_yield() may be used for that purpose then a shared variable seems the expected choice. The starting point should therefore be how you could use a shared variable to make the threads take turns in the appropriate order.
Flawed starting point
Here is a naive approach that might spring immediately to mind:
/* FLAWED */
void *b(void *data){
char *whose_turn = data;
while (*whose_turn != 'b') {
// nothing?
}
printf("b");
*whose_turn = 'c';
return NULL;
}
That is, the thread executes a busy loop, monitoring the shared variable to await it taking a value signifying that the thread should proceed. When it has done its work, the thread modifies the variable to indicate that the next thread may proceed. But there are several problems with that, among them:
Supposing that at least one other thread writes to the object designated by *whose_turn, the program contains a data race, and therefore its behavior is undefined. As a practical matter, a thread that once entered the body of the loop in that function might loop infinitely, notwithstanding any action by other threads.
Without making additional assumptions about thread scheduling, such as a fairness policy, it is not safe to assume that the thread that will make the needed modification to the shared variable will be scheduled in bounded time.
While a thread is executing the loop in that function, it prevents any other thread from executing on the same core, yet it cannot make progress until some other thread takes action. To the extent that we can assume preemptive thread scheduling, this is an efficiency issue and contributory to (2). However, if we assume neither preemptive thread scheduling nor the threads being scheduled each on a separate core then this is an invitation to deadlock.
Possible improvements
The conventional and most appropriate way to do that in a pthreads program is with the use of a mutex and condition variable. Properly implemented, that resolves the data race (issue 1) and it ensures that other threads get a chance to run (issue 3). If that leaves no other threads eligible to run besides the one that will modify the shared variable then it also addresses issue 2, to the extent that the scheduler is assumed to grant any CPU to the process at all.
But you are forbidden to do that, so what else is available? Well, you could make the shared variable _Atomic. That would resolve the data race, and in practice it would likely be sufficient for the wanted thread ordering. In principle, however, it does not resolve issue 3, and as a practical matter, it does not use sched_yield(). Also, all that busy-looping is wasteful.
But wait! You have a clue in that you are told to use sched_yield(). What could that do for you? Suppose you insert a call to sched_yield() in the body of the busy loop:
/* (A bit) better */
void* b(void *data){
char *whose_turn = data;
while (*whose_turn != 'b') {
sched_yield();
}
printf("b");
*whose_turn = 'c';
return NULL;
}
That resolves issues 2 and 3, explicitly affording the possibility for other threads to run and putting the calling thread at the tail of the scheduler's thread list. Formally, it does not resolve issue 1 because sched_yield() has no documented effect on memory ordering, but in practice, I don't think it can be implemented without a (full) memory barrier. If you are allowed to use atomic objects then combining an atomic shared variable with sched_yield() would tick all three boxes. Even then, however, there would still be a bunch of wasteful busy-looping.
Final remarks
Note well that pthread_join() is a synchronization function, thus, as I understand the task, you may not use it to ensure that the main thread's output is printed last.
Note also that I have not spoken to how the main() function would need to be modified to support the approach I have suggested. Changes would be needed for that, and they are left as an exercise.
If I look at the implementation of pthread_equal it looks as follows:
int
__pthread_equal (pthread_t thread1, pthread_t thread2)
{
return thread1 == thread2;
}
weak_alias (__pthread_equal, pthread_equal)
with
typedef unsigned long int pthread_t;
Posix documentation says pthread_equal is threadsafe. But the call to pthread_equals copies and therefore accesses the thread1 and thread2 variables. If those variables are global and changed in this moment by another thread this would lead to undefined behavior.
So should pthread_t not be an atomic? Or does it behave atomically which is ensured in some other way?
This implementation of pthread_equal (which is going to be specific to the POSIX implementation it goes with) does not access any variables except local ones. In C, arguments are always passed by value. thread1 and thread2 are local variables in the function who take their values from the expressions the caller uses when making the call.
With that said, even if they weren't, there would be no problem of thread-unsafety here. Rather, if the caller accessed pthread_t objects that could be changed concurrently by other threads without using appropriate synchronization mechanisms to preclude that, it's a data race bug in the caller, not a thread-safety matter in the callee.
When we say an operation is thread safe, we make certain assumptions on the code that invokes that operation. Those assumptions include making sure the inputs to the operation are stable and that their values remain valid and stable until the operation completes. Otherwise, nothing would be thread safe.
Is it possible for two threads to use a single function "ThreadProc" as its thread procedure when CreateThread() is used?
HANDLE thread1= CreateThread( NULL, //Choose default security
0, //Default stack size
(LPTHREAD_START_ROUTINE)&ThreadProc,
//Routine to execute. I want this routine to be different each time as I want each thread to perform a different functionality.
(LPVOID) &i, //Thread parameter
0, //Immediately run the thread
&dwThreadId //Thread Id
)
HANDLE thread2= CreateThread( NULL, //Choose default security
0, //Default stack size
(LPTHREAD_START_ROUTINE)&ThreadProc,
//Routine to execute. I want this routine to be different each time as I want each thread to perform a different functionality.
(LPVOID) &i, //Thread parameter
0, //Immediately run the thread
&dwThreadId //Thread Id
)
Would the above code create two threads each with same functionality(since thread procedure for both of the threads is same.) Am I doing it correctly?
If it is possible then would there be any synchronization issues since both threads are using same Thread Procedure.
Please help me with this. I am really confused and could not find anything over the internet.
It is fine to use the same function as a thread entry point for multiple threads.
However, from the posted code the address of i is being passed to both threads. If either thread modifies this memory and the other reads then there is a race condition on i. Without seeing the declaration of i it is probably a local variable. This is dangerous as the threads require that i exist for their lifetime. If i does not the threads will have a dangling pointer. It is common practice to dynamically allocate thread arguments and have each thread free its arguments.
Yes, it is very well possible to have multiple (concurrent) threads that start with the same entry point.
Apart from the fact that the OS/threading library specifies the signature and calls it, there is nothing special about a thread entry point function. It can be used to start off multiple threads with the same caveats as for calling any other function from multiple threads: you need synchronization to access non-atomic shared variables.
Each thread uses its own stack area, but that gets allocated by the OS before the Thread Procedure get invoked, so by the time the Thread Procedure gets called all the special actions that are needed to create and start a new thread have already taken place.
Whether the threads are using the same code or not is irrelevant. It has no effect whatsoever on synchronization. It behaves precisely the same as if they were different functions. The issues with potential races is the same.
You probably don't want to pass both threads the same pointers. That will likely lead to data races. (Though we'd have to see the code to know for sure.)
Your code is right. There is NOT any synchronization issues between both threads. If they need synchronization, it maybe because they are change the same global variable, not because they use the same thread Procedure.
Is it possible to read the registers or thread local variables of another thread directly, that is, without requiring to go to the kernel? What is the best way to do so?
You can't read the registers, which wouldn't be useful anyway. But reading thread local variables from another thread is easily possible.
Depending on the architecture (e. g. strong memory ordering like on x86_64) you can safely do it even without synchronization, provided that the read value doesn't affect in any way the thread is belongs to. A scenario would be displaying a thread local counter or similar.
Specifically in linux on x86_64 as you tagged, you could to it like that:
// A thread local variable. GCC extension, but since C++11 actually part of C++
__thread int some_tl_var;
// The pointer to thread local. In itself NOT thread local, as it will be
// read from the outside world.
struct thread_data {
int *psome_tl_var;
...
};
// the function started by pthread_create. THe pointer needs to be initialized
// here, and NOT when the storage for the objects used by the thread is allocated
// (otherwise it would point to the thread local of the controlling thread)
void thread_run(void* pdata) {
pdata->psome_tl_var = &some_tl_var;
// Now do some work...
// ...
}
void start_threads() {
...
thread_data other_thread_data[NTHREADS];
for (int i=0; i<NTHREADS; ++i) {
pthread_create(pthreadid, NULL, thread_run, &other_thread_data[i]);
}
// Now you can access each some_tl_var as
int value = *(other_thread_data[i].psome_tl_var);
...
}
I used similar for displaying some statistics about worker threads. It is even easier in C++, if you create objects around your threads, just make the pointer to the thread local a field in your thread class and access is with a member function.
Disclaimer: This is non portable, but it works on x86_64, linux, gcc and may work on other platforms too.
There's no way to do it without involving the kernel, and in fact I don't think it could be meaningful to read them anyway without some sort of synchronization. If you don't want to use ptrace (which is ugly and non-portable) you could instead choose one of the realtime signals to use for a "send me your registers/TLS" message. The rough idea is:
Lock a global mutex for the request.
Store the information on what data you want (e.g. a pthread_key_t or a special value meaning registers) from the thread in global variables.
Signal the target thread with pthread_kill.
In the signal handler (which should have been installed with sigaction and SA_SIGINFO) use the third void * argument to the signal handler (which really points to a ucontext_t) to copy that ucontext_t to the global variable used to communicate back to the requesting thread. This will give it all the register values, and a lot more. Note that TLS is a bit more tricky since pthread_getspecific is not async-signal-safe and technically not legal to run in this context...but it probably works in practice.
The signal handler posts a semaphore (this is the ONLY async-signal-safe synchronization function offered by POSIX) indicating to the requesting thread that it's done, and returns.
The requesting thread finishes by waiting on the semaphore, then reads the data and unlocks the request mutex.
Note that this will involve at least 1 transition to kernelspace (pthread_kill) in the requesting thread (and maybe another in sem_wait), and 1-3 in the target thread (1 for returning from the signal handler, one for entering the signal handler if it was not already sleeping in kernelspace, and possibly one for sem_post). Still it's probably faster than mucking around with ptrace which is not designed for high-performance usage...