Modifying a threads data from outside the thread - c

How does one modify a threads data from outside a thread?
If a thread is running a function that loops for the runtime of the application, how can its data be set, changed?
How does one call functions which modify a specific threads functions?
Where do these functions belong?

The advantage and disadvantage of threads is that they share the memory space with every other thread in the process. You can use any form of data transfer you would use in single threaded applications to pass data betweens segments of you application. However, in a multi-threaded application you must use some type of synchronization to assure data integrity and prevent deadlocks.

If the "thread's data" you want to modify from outside is in the form of local variables in a function running in the thread, or thread-specific data created with the __thread extension, then the only way you can modify them from outside (modulo code with UB that's technically just trashing memory) is by having the thread take the addresses of its variables and store that somewhere where other threads can see it (either in a global variable, or at a location passed in via the thread start function's void * argument.
Also note that, as rerun pointed out, you have to use some method of synchronization if multiple threads are accessing the same data. The only standard/portable synchronization methods are the pthread ones: pthread_mutex_lock etc., but you can also use assembly or compiler intrinsics (like __sync_* in gcc).

Related

Making a C library thread safe

I am writing a shared library in C. I know C functions are not thread safe.
My library routines looks like,
struct lib_handle {
....
};
int lib_init(lib_handle **handle);
int lib_process(lib_handle *handle);
....
....
Every method takes a pointer to lib_handle object. All the state is stored inside this structure. No global variables are used.
I assume if each thread creates it's own lib_handle instances, multiple threads can use the library functions. Since each thread has it's own handle, everythibg should work.
I haven't validated this assumption yet. I am wondering what you guys think about this design and do you thing I can state my library as thread safe given each thread has it's own handles?
Any help would be great!
That will make data/state of library thread safe.
But you also have to make sure that your library uses threadsafe functions from other libraries, e.g. use strtok_r instead of strtok.
Threads works in shared memory space. Unsafe objects are the objects which can be accessed by multiple threads simulteniously. So if you have single lib_handle object for each threads there will be no problems.
If each thread has a private lib_handle object your library should be fully threadsafe; if you let several threads share lib_handle objects the person using your library can still makea thread safe program if she uses your library correctly (i.e. your library is not inherently thread-unsafe which it would be if you used e.g. global variables).
If this mode of operation (shared lib_handle) is interesting you should clearly separate the functions which only read the state of lib_handle and those which manipulate the state of lib_handle. The former needing a read lock and the latter needing a write lock (the calling scope must handle this).
For what it is worth I have used the pattern you describe quite a lot, and like it.

POSIX threads and global variables in C on Linux

If I have two threads and one global variable (one thread constantly loops to read the variable; the other constantly loops to write to it) would anything happen that shouldn't? (ex: exceptions, errors). If it, does what is a way to prevent this. I was reading about mutex locks and that they allow exclusive access to a variable to one thread. Does this mean that only that thread can read and write to it and no other?
Would anything happen that shouldn't?
It depends in part on the type of the variables. If the variable is, say, a string (long array of characters), then if the writer and the reader access it at the same time, it is completely undefined what the reader will see.
This is why mutexes and other coordinating mechanisms are provided by pthreads.
Does this mean that only that thread can read and write to it and no other?
Mutexes ensure that at most one thread that is using the mutex can have permission to proceed. All other threads using the same mutex will be held up until the first thread releases the mutex. Therefore, if the code is written properly, at any time, only one thread will be able to access the variable. If the code is not written properly, then:
one thread might access the variable without checking that it has permission to do so
one thread might acquire the mutex and never release it
one thread might destroy the mutex without notifying the other
None of these is desirable behaviour, but the mere existence of a mutex does not prevent any of these happening.
Nevertheless, your code could reasonably use a mutex carefully and then the access to the global variable would be properly controlled. While it has permission via the mutex, either thread could modify the variable, or just read the variable. Either will be safe from interference by the other thread.
Does this mean that only that thread can read and write to it and no other?
It means that only one thread can read or write to the global variable at a time.
The two threads will not race amongst themselves to access the global variable neither will they access it at the same time at any given point of time.
In short the access to the global variable is Synchronized.
First; In C/C++ unsynchronized read/write of variable does not generate any exceptions or system error, BUT it can generate application level errors -- mostly because you are unlikely to fully understand how the memory is accessed, and whether it is atomic unless you look at the generated assembler. A multi core CPU may likely create hard-to-debug race conditions when you access shared memory without synchronization.
Hence
Second; You should always use synchronization -- such as mutex locks -- when dealing with shared memory. A mutex lock is cheap; so it will not really impact performance if done right. Rule of thumb; keep the lcok for as short as possible, such as just for the duration of reading/incrementing/writing the shared memory.
However, from your description, it sounds like that one of your threads is doing nothing BUT waiting for the shared meory to change state before doing something -- that is a bad multi-threaded design which cost unnecessary CPU burn, so
Third; Look at using semaphores (sem_create/wait/post) for synchronization between your threads if you are trying to send a "message" from one thread to the other
As others already said, when communicating between threads through "normal" objects you have to take care of race conditions. Besides mutexes and other lock structures that are relatively heavy weight, the new C standard (C11) provides atomic types and operations that are guaranteed to be race-free. Most modern processors provide instructions for such types and many modern compilers (in particular gcc on linux) already provide their proper interfaces for such operations.
If the threads truly are only one producer and only one consumer, then (barring compiler bugs) then
1) marking the variable as volatile, and
2) making sure that it is correctly aligned, so as to avoid interleaved fetches and stores
will allow you to do this without locking.

c/c++ joining processes?

I am new to threads and processes.
I have code that works fine right now with forking the code into multiple processes. However each process needs to add to a global variable, but from what I read, each time the process forks, it takes a copy of the global, and adds them independently. Is there a way to join them, like you can with threads?
Different processes can communicate and exchange data via shared memory.
On linux, you can look:
man shm_overview
for attaching a memory segment on several processes
and
man sem_overview
for the semaphore library for controlling parallel access.
You should define a struct with two fields, one for your global and one for a semaphore. Then, before any forking occurs, create some shared memory in the parent process big enough to hold this struct and initialize one there. In the children, map in the shared memory so they can access the global. All processes, parent and children, should obey the rules of the semaphore when accessing the global.
To avoid unnecessary blocking which can hurt performance, try not to hold the semaphore too long. When reading the global, make a quick copy of it in a process and use that, rather than holding the semaphore for the entire time you are using its value. Likewise, when changing the global, prepare your changes ahead of time (before you grab the semaphore) and, once you have the semaphore, copy them in all at once. Sometimes your work depends on reading and writing the global without it changing in between being read and written. In this case, some blocking may be inevitable.
It is not clear what platform you are on, but all major PC and server platforms (Windows, Linux/Unix/Mac OS) have support for shared memory and semaphores. The APIs may be different, but the functionality you need is there.

pthread Linux data runtime

I have threads in my application that wait on condition variable. When the codition is good thread starts to work and reads some data. My data is global variable. Is it possible pass data on runtime without using global data? I read something about specific data but i don't know if it is useful in this case. Thank you!
Yes, you can pass this to your thread routine: pthread_create(thread, attr, function, *USER_ARG*). Simply create a struct for the data you need for the thread to execute.
Where *USER_ARG* is stored in memory is important, you will often want to use the free store (malloc it) for the argument, otherwise you may corrupt the stack of the thread which called pthread_create.

Does local variable in thread function have separe copy according to thread?

I have declared some local variable in one function like this:
void* thread_function (void* parameter)
{
struct parameter * thread_data = (struct parameter *)parameter;
char buffer[20];
int temp;
}
Here if I have created two threads then in one thread if buffer & temp is updated so will it effect other thread ?
i mean if there are two thread then does there will be two copy of all local variable?
EDIT : then in which case i need to used thread specific data.? i mean pthread_setspecific & all such stuff
These variables are allocated on the stack, and each thread has its own stack: these variables are private to each thread (they are not shared). (See this answer for more details.)
If you assign thread_data to a global pointer, for example, other threads will be able to access thread_data via the global pointer.
Thread specific data (e.g. pthread_setspecific) is used to create variables that are global, but still specific to each thread (not shared): They are thread-specific global variables.
You need to use thread specific variables when you want global variables, but don't want to share them between threads.
It's not that each thread has its own copy, it's that each instance of a function invocation has its own copy of all automatic (i.e. local non-static) variables, regardless of whether the instances are in the same thread or different threads. This is true if the instances come into existence due to invocation in different threads, recursive invocation, mutual/indirect recursion, or even invocation from an asynchronous signal handler. Note that while the C standard does not specify threads, the relevant section in the standard is probably 5.2.3 Signals and interrupts:
Functions shall be implemented such that they may be interrupted at any time by a signal, or may be called by a signal handler, or both, with no alteration to earlier, but still active, invocations' control flow (after the interruption), function return values, or objects with automatic storage duration. All such objects shall be maintained outside the function image (the instructions that compose the executable representation of a function) on a per-invocation basis.
This makes it explicit that each invocation must have its own storage for automatic variables.
Local variables are stored in stack memory, which is private to a thread.
Therefore they are not shared between threads: there will be an independent copy of each variable in each thread
Update
Whether you would want to share data between threads really boils down to a design question; What are your threads doing? Are their effort co-ordinated or are they simply workers processing a queue.
The main thing to consider is synchronization of shared data. Variables that are shared between threads are variables that can change value unexpectedly (within a single thread) and so need to be treated as such. I would suggest that you err on the side of not sharing, unless you have a specific reason to do so.

Resources