Sharing dynamically-allocated char * between threads - c

My application has two threads (thread A and thread B) that read and write from/to a global structure like the following:
struct global_data_s {
pthread_mutex_t mutex;
uint8_t a;
char *string;
}
Thread A writes to global_data_s through a function:
set_global_data(struct global_data_s *in);
which sets the global structure according to the input global_data_s *in passed to the function (which manages the mutex lock/unlock mechanism).
Thread B reads from global_data_s with:
get_global_data(struct global_data_s *out);
This works as expected for static data (like uint8_t a), since the actual global data is only accessible by the get/set functions, and threads can freely work on a copy of the global data structure (the set function passes a local copy of the structure allocated in thread A and overwrites the global structure, while the get function returns a copy of the global data to a local struct in thread B. Everything is protected by mutex lock/unlock mechanism).
I found myself in a bit of a pickle, though, since string is allocated dynamically (via malloc) by thread A and contains a json-formatted string that is constantly updated (e.g. every second).
Thread B is actually a webserver, that needs to get string and send it as soon as a GET request is made (this action should be performed for every incoming GET request).
I fear the whole "work on a copy of the global shared structure" fails here, since dynamic memory is involved.
My point is, when and where should I free the allocated string? Freeing the local copy of string obviously frees the shared global structure too (since both pointers point to the same address).
I hope I made myself clear, but please, feel free to ask if anything is unclear.
Thank you in advance.

When architecturally feasible global variables and threading are not good bedfellows. But when forced to use them together, the global becomes a shared resource that can only be used by one thread at a time. It then becomes a simple matter of writing code that allows the threads to play nicely with each other, and share.
Enter mutex. (or some other thread safety device.)
In this case, you are using mutex. Each thread must wait for ownership of the mutex before it can execute the code that accesses the shared resource. With that in mind, each event that requires a thread to gain access to this global struct member, whether to read or to write, must first test the mutex to see if it is already being used by another thread. If not, take ownership, use the resource, release ownership. If it is already owned, it is the responsibility of the calling code to have a try-again method, that persists attempts to access until the mutex is free. When free, again, take ownership, use the resource, release ownership.

Related

Mutex on pointers to a shared variable

I am new to thread programming. I know that mutexes are used to protect access to shared data in a multi-threaded program.
Suppose I have one thread with variable a and a second one with the pointer variable p that holds the address of a. Is the code safe if, in the second thread, I lock a mutex before I modify the value of a using the pointer variable? From my understanding it is safe.
Can you confirm? And also can you provide the reason why it is true or why it is not true?
I am working with c and pthreads.
The general rule when doing multithreading is that shared variables among threads that are read and written need to be accessed serially, which means that you must use some sort of synchronization primitive. Mutexes are a popular choice, but no matter what you end up using, you just need to remember that before reading from or writing to a shared variable, you need to acquire a lock to ensure consistency.
So, as long as every thread in your code agrees to always use the same lock before accessing a given variable, you're all good.
Now, to answer your specific questions:
Is the code safe if, in the second thread, I lock a mutex before I
modify the value of a using the pointer variable?
It depends. How do you read a on the first thread? The first thread needs to lock the mutex too before accessing a in any way. If both threads lock the same mutex before reading or writing the value of a, then it is safe.
It's safe because the region of code between the mutex lock and unlock is exclusive (as long as every thread respects the rule that before doing Y, they need to acquire lock X), since only one thread at a time can have the lock.
As for this comment:
And if the mutex is locked before p is used, then both a and p are
protected? The conclusion being that every memory reference present in
a section where a mutex is locked is protected, even if the memory is
indirectly referenced?
Mutexes don't protect memory regions or references, they protect a region of code. Whatever you make between locking and unlocking is exclusive; that's it. So, if every thread accessing or modifying either of a or p locks the same mutex before and unlocks afterwards, then as a side-effect you have synchronized accesses.
TL;DR Mutexes allow you to write code that never executes in parallel, you get to choose what that code does - a remarkably popular pattern is to access and modify shared variables.

PThreads Address Space

Is there any way to force threads to have independent address spaces? I'd like to have many threads running loops using local variables - but it seems they all share the same variables.
for example
for (i = args->start; i < args->end; i++) {
printf("%d\n", i);
if (quickFind(getReverse(array[i]), 0, size - 1)) {
printf("%s\n", array[i]);
//strcpy(array[i], "");
}
}
i seems to be shared across threads.
Threads share the memory space of their parent process. Its their characteristic. If you don't want that to happen you can create a new process, which'll have it's own address space, using fork().
If you do decide to use fork() remember that, on successfully creating a child process, it returns 0 to the child process and the PID of the child process to the parent process.
Short answer: Yes it's possible for each thread to have its own copy of the variable i.
Long answer:
All threads share the same address space and the OS does not provide any protection to prevent one thread from accessing memory used by another. However, the memory can be partitioned so that it will only be accessed by a single thread rather than being shared by all threads.
By default each thread receives its own stack. So if you allocate a variable on the stack then it will typically only be accessed by a single thread. Note that it is possible to pass a pointer to a stack variable from one thread to another, but this is not recommended and could be the source of the sort of problems that you are seeing.
Another way for a thread to receive its own copy of a variable is using thread local storage. This allows each thread to have its own copy of a global variable.
In summary, although threads share an address space they can work on private data. But you need to be careful with how you are sharing data between threads and avoid data races.
Just have each thread call into the function separately. Each invocation of a function gets its own instances of all local variables. If this wasn't true, recursion wouldn't work.
If you want to be really lazy, and make no design changes whatsoever (not really recommended), you can modify the declaration of i to something like __thread int i, so that every thread will have its own instance of that variable.
If you were using OpenMP instead of Posix threads, you could also say #pragma omp threadprivate(i) before the first usage of i.

Reading registers or thread local variables of another thread

Is it possible to read the registers or thread local variables of another thread directly, that is, without requiring to go to the kernel? What is the best way to do so?
You can't read the registers, which wouldn't be useful anyway. But reading thread local variables from another thread is easily possible.
Depending on the architecture (e. g. strong memory ordering like on x86_64) you can safely do it even without synchronization, provided that the read value doesn't affect in any way the thread is belongs to. A scenario would be displaying a thread local counter or similar.
Specifically in linux on x86_64 as you tagged, you could to it like that:
// A thread local variable. GCC extension, but since C++11 actually part of C++
__thread int some_tl_var;
// The pointer to thread local. In itself NOT thread local, as it will be
// read from the outside world.
struct thread_data {
int *psome_tl_var;
...
};
// the function started by pthread_create. THe pointer needs to be initialized
// here, and NOT when the storage for the objects used by the thread is allocated
// (otherwise it would point to the thread local of the controlling thread)
void thread_run(void* pdata) {
pdata->psome_tl_var = &some_tl_var;
// Now do some work...
// ...
}
void start_threads() {
...
thread_data other_thread_data[NTHREADS];
for (int i=0; i<NTHREADS; ++i) {
pthread_create(pthreadid, NULL, thread_run, &other_thread_data[i]);
}
// Now you can access each some_tl_var as
int value = *(other_thread_data[i].psome_tl_var);
...
}
I used similar for displaying some statistics about worker threads. It is even easier in C++, if you create objects around your threads, just make the pointer to the thread local a field in your thread class and access is with a member function.
Disclaimer: This is non portable, but it works on x86_64, linux, gcc and may work on other platforms too.
There's no way to do it without involving the kernel, and in fact I don't think it could be meaningful to read them anyway without some sort of synchronization. If you don't want to use ptrace (which is ugly and non-portable) you could instead choose one of the realtime signals to use for a "send me your registers/TLS" message. The rough idea is:
Lock a global mutex for the request.
Store the information on what data you want (e.g. a pthread_key_t or a special value meaning registers) from the thread in global variables.
Signal the target thread with pthread_kill.
In the signal handler (which should have been installed with sigaction and SA_SIGINFO) use the third void * argument to the signal handler (which really points to a ucontext_t) to copy that ucontext_t to the global variable used to communicate back to the requesting thread. This will give it all the register values, and a lot more. Note that TLS is a bit more tricky since pthread_getspecific is not async-signal-safe and technically not legal to run in this context...but it probably works in practice.
The signal handler posts a semaphore (this is the ONLY async-signal-safe synchronization function offered by POSIX) indicating to the requesting thread that it's done, and returns.
The requesting thread finishes by waiting on the semaphore, then reads the data and unlocks the request mutex.
Note that this will involve at least 1 transition to kernelspace (pthread_kill) in the requesting thread (and maybe another in sem_wait), and 1-3 in the target thread (1 for returning from the signal handler, one for entering the signal handler if it was not already sleeping in kernelspace, and possibly one for sem_post). Still it's probably faster than mucking around with ptrace which is not designed for high-performance usage...

Does local variable in thread function have separe copy according to thread?

I have declared some local variable in one function like this:
void* thread_function (void* parameter)
{
struct parameter * thread_data = (struct parameter *)parameter;
char buffer[20];
int temp;
}
Here if I have created two threads then in one thread if buffer & temp is updated so will it effect other thread ?
i mean if there are two thread then does there will be two copy of all local variable?
EDIT : then in which case i need to used thread specific data.? i mean pthread_setspecific & all such stuff
These variables are allocated on the stack, and each thread has its own stack: these variables are private to each thread (they are not shared). (See this answer for more details.)
If you assign thread_data to a global pointer, for example, other threads will be able to access thread_data via the global pointer.
Thread specific data (e.g. pthread_setspecific) is used to create variables that are global, but still specific to each thread (not shared): They are thread-specific global variables.
You need to use thread specific variables when you want global variables, but don't want to share them between threads.
It's not that each thread has its own copy, it's that each instance of a function invocation has its own copy of all automatic (i.e. local non-static) variables, regardless of whether the instances are in the same thread or different threads. This is true if the instances come into existence due to invocation in different threads, recursive invocation, mutual/indirect recursion, or even invocation from an asynchronous signal handler. Note that while the C standard does not specify threads, the relevant section in the standard is probably 5.2.3 Signals and interrupts:
Functions shall be implemented such that they may be interrupted at any time by a signal, or may be called by a signal handler, or both, with no alteration to earlier, but still active, invocations' control flow (after the interruption), function return values, or objects with automatic storage duration. All such objects shall be maintained outside the function image (the instructions that compose the executable representation of a function) on a per-invocation basis.
This makes it explicit that each invocation must have its own storage for automatic variables.
Local variables are stored in stack memory, which is private to a thread.
Therefore they are not shared between threads: there will be an independent copy of each variable in each thread
Update
Whether you would want to share data between threads really boils down to a design question; What are your threads doing? Are their effort co-ordinated or are they simply workers processing a queue.
The main thing to consider is synchronization of shared data. Variables that are shared between threads are variables that can change value unexpectedly (within a single thread) and so need to be treated as such. I would suggest that you err on the side of not sharing, unless you have a specific reason to do so.

Modifying a threads data from outside the thread

How does one modify a threads data from outside a thread?
If a thread is running a function that loops for the runtime of the application, how can its data be set, changed?
How does one call functions which modify a specific threads functions?
Where do these functions belong?
The advantage and disadvantage of threads is that they share the memory space with every other thread in the process. You can use any form of data transfer you would use in single threaded applications to pass data betweens segments of you application. However, in a multi-threaded application you must use some type of synchronization to assure data integrity and prevent deadlocks.
If the "thread's data" you want to modify from outside is in the form of local variables in a function running in the thread, or thread-specific data created with the __thread extension, then the only way you can modify them from outside (modulo code with UB that's technically just trashing memory) is by having the thread take the addresses of its variables and store that somewhere where other threads can see it (either in a global variable, or at a location passed in via the thread start function's void * argument.
Also note that, as rerun pointed out, you have to use some method of synchronization if multiple threads are accessing the same data. The only standard/portable synchronization methods are the pthread ones: pthread_mutex_lock etc., but you can also use assembly or compiler intrinsics (like __sync_* in gcc).

Resources