If I have a threaded function, and from inside it I call another function (that is located in a separate .c file), does each instance of the threaded function get it's own version of the called function? Example:
file 1
void* threaded_func(void* data) {
function2(...);
}
file 2
int a;
int function2(...) {
int b;
}
In my example, would int a be the same int a for all threads? What about for variables I define inside function2 (i.e int b)?
Some context:
I'm creating a simple HTTP server and my function2 is a serve_client function which should only return when the connection is closed. I'm having a problem that I think is due to some variables in file2 (int a) being overwritten for each new thread. I think my problem is me expecting a C source file to behave like a class would in Java.
does each instance of the threaded function get it's own version of the called function?
Loosely speaking, that's correct, provided the function doesn't use any global variables (e.g., int a; in your example) or any variable with static storage duration. If a is "shared" by multiple threads, that's likely to the source of your problem.
If a needs to be shared among threads, then you need to synchronize its access. If you a needs to be unique to each of threads, you need thread local storage. E.g., using C11's __Thread_local or gcc's __thread.
A C function is just machine code, and could be run by several threads (or processes). What matters is the data used by that function. C don't have closures (you could emulate them with callbacks, that is some function using additional client data). As answered by usr read also about thread local storage duration with thread_local. Each thread has its own call stack.
You should first read some POSIX thread tutorial.
You need to care about synchronization (and semaphores), notably using mutexes (and lock so serialize access to global shared variables). Read also about ACID transactions. Sometimes you can use atomic types and atomic operations.
BTW, you might want to use some HTTP server library like libonion.
You should look into the source code of existing free software projects (e.g. on github).
On Linux, read also about pthreads(7), nptl(7) and futex(7) and sem_overview(7) and clone(2).
Related
I can think of two ways a thread-safe library can be used:
One is having a global instance of the library protected by a mutex, which is initialised by the main thread and used by worker threads, like so:
mutex g_lib_mutex;
lib_t g_lib;
thread:
lock(&g_lib_mutex);
/* use lib */
unlock(&g_lib_mutex);
main:
lib_init(&g_lib);
start_threads(thread);
lock(&g_lib_mutex);
/* use lib */
unlock(&g_lib_mutex);
join_threads();
lib_close(&g_lib);
The other, is for every thread to have a local instance of the library, something like this:
thread:
lib_t g_lib;
lib_init(&g_lib);
/* use lib */
lib_close(&g_lib);
main:
start_threads(thread);
lib_t g_lib;
lib_init(&g_lib);
/* use lib */
lib_close(&g_lib);
Which of these ways is more correct / preferable?
Do I need to protect library calls with a global mutex in both cases?
I was trying to use libmysql and POSIX message queues in a multi-threaded application when this question crossed my mind.
Generally, only initialize a library once. Remember, all threads happen in the same process' memory space, so whatever you do to any global variables in thread X is true for all threads. Library initialization should happen only once per process.
Now, whether library calls are thread safe or must be protected by mutexes is a question of your library. Modern libraries should have definite documentation on what functions you're allowed to call from multiple threads. If that info is missing you can either
assume the worst and encapsulate everything that changes something that the library deals with, or calls into the library, with a single global mutex, or
read the source code of the library to figure out what might go wrong where, introduce security measures (mutexes/conditions) accordingly, and make sure that no one uses a different version of the library (where things might be different), or
improve the documentation, send that patch to the upstream developers asking them to verify that what you document in thread-(un)safety is intentional and matches reality, (documentation patches are, for any project that I know of, always welcome) or
modify the library itself to be thread safe (making yourself a hero).
all the threads share memory location. For example a global variable changes in one thread will reflect in another thread. Since each thread has its own stack, the local
variables that are created inside the thread is unique. In this case, why do we need
to go for thread specific data mechanism?. Can't it be achieved by auto storage varibles
inside the thread function ?
Kindly clarify!!!.
BR
Rj
Normal globals are shared between threads. Local variables are specific to a particular invocation of a function. If you want something that (for example) is visible to a number of functions running in the same thread, but unique to that thread, then thread specific data is what you're looking for.
It's not required but it's rather handy. Some functions like rand and strtok use static storage duration information which is likely to be problematic when shared among threads.
Say you have a random number function where you want to maintain a different sequence (hence seed) for each thread. You have two approaches.
You can use something like the kludgy:
int seed;
srand (&seed, time (NULL));
int r = rand_r (void *seed);
where the seed has to be created by the caller and passed in each time.
Or you can use the rather nicer, ISO-compliant:
srand (time (NULL));
int r = rand();
that uses thread-local storage to maintain a thread-specific seed. Similarly with the information used by strtok regarding the locations within the string it's processing.
That way, you don't have to muck about with changing your code between threaded and non-threaded versions.
Now you could create that information in the thread function but how is the rand function going to know about it's address without it being passed down. And what if rand is called 87 stack levels down? That's an awful lot of levels to be transferring a pointer through.
And, even if you do something like:
void pthread_fn (void *unused) {
int seed;
rand_set_seed_location (&seed);
:
}
and rand subsequently uses that value regardless of how deep it is in the stack, that's still a code change from the standard. It may work but so may writing an operating system in COBOL. That doesn't make it a good idea :-)
Yes, the stack is one way of allocating thread-local storage (including handles to heap allocations local to the particular thread).
The best example for thread specific data is the "errno". When a call to some function in c library failed, the errno is set, and you can check it out to find the reason of the failure. If there's no thread specific data, it's impossible to port these functions to multi-thread environment because the errno could be set by other threads before you check it.
As a general rule, most uses of TSD should be avoided in new APIs. If a function needs some information, it should be passed to it.
However, sometimes you need TSD to 'paper over' an API defect. A good example is 'gmtime'. The 'gmtime' function returns a pointer to a structure that is valid until the next call to 'gmtime'. But that would make 'gmtime' awfully hard to use in a multi-threaded program. What if some library called 'gmtime' when you didn't expect it, trashing your structure? One simple workaround is make the structure returned thread-specific. (The long-term solution, of course, is to create a more suitable API such as 'gmtime_r'.)
One case where it's perfectly reasonable to use TSD in new designs is for information that won't be accessed frequently that would clutter the API. For example, if a critical error is discovered, it might be nice to log certain context information from higher-level code (Which client were you serving? What command did they send?). Your choices are basically to pass this context information from function to function to function (which isn't even always possible if some of the functions are outside your control) or to store it in TSD.
Is thread local storage used anywhere else other than making global and static variables local to a thread?Is it useful in any new code that we write?
TLS can certainly be useful in new code. If you ever want a global variable which needs to be specific to each thread, (like errno in C/C++), thread-local-storage is the way to go.
Thread specific singleton objects? A multi-threaded web server where each thread is handling one request, there is quite a good amount of possibility of some TLS data (like request URL or some database connections, essentially some resources intended to be used at any point during request handling if required) so that they can be easily accessed anywhere in the code when required.
These days errno is typically put in thread-local storage.
There are some situations (eg: shared libraries like DLLs that require startup code) where using thread-local storage can be a problem.
I've only needed it for thread-specific error handling, and optimization (in C):
__thread int cpfs_errno;
static __thread struct Cpfs *g_cpfs;
In this example, this saves me passing a context pointer of struct Cpfs * through dozens of functions in which it never changes.
I'm not sure about how pthread dataspecific works : considering the next code (found on the web), does this means i can create for example 5 threads in the main, have a call to func in only some of them (let's say 2) those threads would have the data 'key' set to something (ptr = malloc(OBJECT_SIZE) ) and the other threads would have the same key existing but with a NULL value?
static pthread_key_t key;
static pthread_once_t key_once = PTHREAD_ONCE_INIT;
static void
make_key()
{
(void) pthread_key_create(&key, NULL);
}
func()
{
void *ptr;
(void) pthread_once(&key_once, make_key);
if ((ptr = pthread_getspecific(key)) == NULL) {
ptr = malloc(OBJECT_SIZE);
...
(void) pthread_setspecific(key, ptr);
}
...
}
Some explanation on how dataspecific works and how it may have been implemented in pthread (simple way) would be appreciated!
Your reasoning is correct. These calls are for thread-specific data. They're a way of giving each thread a "global" area where it can store what it needs, but only if it needs it.
The key is shared among all threads, since it's created with pthread_once() the first time it's needed, but the value given to that key is different for each thread (unless it remains set to NULL). By having the value a void* to a memory block, a thread that needs thread-specific data can allocate it and save the address for later use. And threads that don't call a routine that needs thread-specific data never waste memory since it's never allocated for them.
The one area where I have used them is to make a standard C library thread-safe. The strtok() function (as opposed to a thread-safe strtok_r() which was considered an abomination when we were doing this) in an implementation I was involved in used almost this exact same code the first time it was called, to allocate some memory which would be used by strtok() for storing information for subsequent calls. These subsequent calls would retrieve the thread-specific data to continue tokenizing the string without interfering with other threads doing the exact same thing.
It meant users of the library didn't have to worry about cross-talk between threads - they still had to ensure a single thread didn't call the function until the last one had finished but that's the same as with single-threaded code.
It allowed us to give a 'proper' C environment to each thread running in our system without the usual "you have to call these special non-standard re-entrant routines" limitations that other vendors imposed on their users.
As for implementation, from what I remember of DCE user-mode threads (which I think were the precursor to the current pthreads), each thread had a single structure which stored things like instruction pointers, stack pointers, register contents and so on. It was a very simple matter to add one pointer to this structure to achieve very powerful functionality with minimal cost. The pointer pointed to a array (linked list in some implementations) of key/pointer pairs so each thread could have multiple keys (e.g., one for strtok(), one for rand()).
The answer to your first question is yes. In simple terms, it allows each thread to allocate and save its own data. This is roughly equivalent to w/o each thread simply allocating and passing around its own data structure. The API saves you the trouble of passing the thread-local structure to all subfunctions, and allows you to look it up on demand instead.
The implementation really doesn't matter all that much (it may vary per-OS), as long as the results are the same.
You can think of it as a two-level hashmap. The key specifies which thread-local "variable" you want to access, and the second level might perform a thread-id lookup to request the per-thread value.
Does anybody know of (or can post) an example of the use of thread-specific data? I'm looking for something that's clearly explained and easy to understand. I've got a global char * variable that I'm wanting to share between a couple threads, and I'm thinking this is what the thread specific data mechanism in C is for, am I right?
I'm a Linux user!
Actually, thread-specific data is for when you DON'T want to share data between threads -- with thread-specific data, each thread can use the same variable name, but that variable refers to distinct storage.
With gcc, you can declare a variable as thread-specific using the __thread attribute. If you are only trying to make a primitive type thread-specific, and you are only dealing with Linux and GCC, then this is a possible solution. If you actually want to be portable, though, between various unices (a desireable goal), or if you want to make complex data types thread-specific, than you need to use the UNIX routines for that...
The way it works in UNIX is that you use pthread_key_create before any thread is spawned, in order to create a unique variable name. You then use pthread_setspecific and pthread_getspecific to modify/access the data associated with the key. The semantics of the set/get specific functions is that the key behaves as an index into a map, where each thread has its own map, so executing these routines from different threads causes different data to be accessed/modified. If you can use a map, you can use thread-specific storage.
Obviously, when you are done, you need to call the appropriate routines to cleanup the data. You can use pthread_cleanup_push to schedule a cleanup routine to deallocate any datastructures you have associated with the thread-specific key, and you can use pthread_key_destroy when the key is no longer in use.
The errno variable from the original C runtime library is a good example. If a process has two threads making system calls, it would be extremely bad for that to be a shared variable.
thread 1:
int f = open (...);
if (f < 0)
printf ("error %d encountered\n", errno);
thread 2:
int s = socket (...);
if (s < 0)
printf ("error %d encountered\n", errno);
Imagine the confusion if open and socket are called at about the same time, both fail somehow, and both try to display the error number!
To solve this, multi-threaded runtime libraries make errno an item of thread-specific data.
The short answer to your question is: you don't have to do anything to share a variable between multiple thread. All the global variables are shared among all threads by default.
When a variable has to be different for each thread, if you are using a ISO-C99 compliant implementation (like GCC), you only need to add the __thread storage class keyword to your variable declaration, as in:
__thread char *variable;
This will instruct all the tiers in the building chain (cc, ld, ld.so, libc.so and libpthread.so) to manipulate this variable in a special thread-specific way.
The following compilers support this syntax (cf wikipedia):
Sun Studio C/C++
IBM XL C/C++
GNU C
Intel C/C++ (Linux systems)
Borland C++ Builder