Why thread specific data is required in pthread?

Why thread specific data is required in pthread? - c

all the threads share memory location. For example a global variable changes in one thread will reflect in another thread. Since each thread has its own stack, the local
variables that are created inside the thread is unique. In this case, why do we need
to go for thread specific data mechanism?. Can't it be achieved by auto storage varibles
inside the thread function ?
Kindly clarify!!!.
BR
Rj

Normal globals are shared between threads. Local variables are specific to a particular invocation of a function. If you want something that (for example) is visible to a number of functions running in the same thread, but unique to that thread, then thread specific data is what you're looking for.

It's not required but it's rather handy. Some functions like rand and strtok use static storage duration information which is likely to be problematic when shared among threads.
Say you have a random number function where you want to maintain a different sequence (hence seed) for each thread. You have two approaches.
You can use something like the kludgy:
int seed;
srand (&seed, time (NULL));
int r = rand_r (void *seed);
where the seed has to be created by the caller and passed in each time.
Or you can use the rather nicer, ISO-compliant:
srand (time (NULL));
int r = rand();
that uses thread-local storage to maintain a thread-specific seed. Similarly with the information used by strtok regarding the locations within the string it's processing.
That way, you don't have to muck about with changing your code between threaded and non-threaded versions.
Now you could create that information in the thread function but how is the rand function going to know about it's address without it being passed down. And what if rand is called 87 stack levels down? That's an awful lot of levels to be transferring a pointer through.
And, even if you do something like:
void pthread_fn (void *unused) {
int seed;
rand_set_seed_location (&seed);
:
}
and rand subsequently uses that value regardless of how deep it is in the stack, that's still a code change from the standard. It may work but so may writing an operating system in COBOL. That doesn't make it a good idea :-)

Yes, the stack is one way of allocating thread-local storage (including handles to heap allocations local to the particular thread).

The best example for thread specific data is the "errno". When a call to some function in c library failed, the errno is set, and you can check it out to find the reason of the failure. If there's no thread specific data, it's impossible to port these functions to multi-thread environment because the errno could be set by other threads before you check it.

As a general rule, most uses of TSD should be avoided in new APIs. If a function needs some information, it should be passed to it.
However, sometimes you need TSD to 'paper over' an API defect. A good example is 'gmtime'. The 'gmtime' function returns a pointer to a structure that is valid until the next call to 'gmtime'. But that would make 'gmtime' awfully hard to use in a multi-threaded program. What if some library called 'gmtime' when you didn't expect it, trashing your structure? One simple workaround is make the structure returned thread-specific. (The long-term solution, of course, is to create a more suitable API such as 'gmtime_r'.)
One case where it's perfectly reasonable to use TSD in new designs is for information that won't be accessed frequently that would clutter the API. For example, if a critical error is discovered, it might be nice to log certain context information from higher-level code (Which client were you serving? What command did they send?). Your choices are basically to pass this context information from function to function to function (which isn't even always possible if some of the functions are outside your control) or to store it in TSD.

Related

Calling a function from inside a pthread?

If I have a threaded function, and from inside it I call another function (that is located in a separate .c file), does each instance of the threaded function get it's own version of the called function? Example:
file 1
void* threaded_func(void* data) {
function2(...);
}
file 2
int a;
int function2(...) {
int b;
}
In my example, would int a be the same int a for all threads? What about for variables I define inside function2 (i.e int b)?
Some context:
I'm creating a simple HTTP server and my function2 is a serve_client function which should only return when the connection is closed. I'm having a problem that I think is due to some variables in file2 (int a) being overwritten for each new thread. I think my problem is me expecting a C source file to behave like a class would in Java.

does each instance of the threaded function get it's own version of the called function?
Loosely speaking, that's correct, provided the function doesn't use any global variables (e.g., int a; in your example) or any variable with static storage duration. If a is "shared" by multiple threads, that's likely to the source of your problem.
If a needs to be shared among threads, then you need to synchronize its access. If you a needs to be unique to each of threads, you need thread local storage. E.g., using C11's __Thread_local or gcc's __thread.

A C function is just machine code, and could be run by several threads (or processes). What matters is the data used by that function. C don't have closures (you could emulate them with callbacks, that is some function using additional client data). As answered by usr read also about thread local storage duration with thread_local. Each thread has its own call stack.
You should first read some POSIX thread tutorial.
You need to care about synchronization (and semaphores), notably using mutexes (and lock so serialize access to global shared variables). Read also about ACID transactions. Sometimes you can use atomic types and atomic operations.
BTW, you might want to use some HTTP server library like libonion.
You should look into the source code of existing free software projects (e.g. on github).
On Linux, read also about pthreads(7), nptl(7) and futex(7) and sem_overview(7) and clone(2).

Is CryptGenRandom() thread-safe?

Is CryptGenRandom() thread-safe with a single global program-wide HCRYPTPROV instance?
MSDN appears to lack any info on this: https://msdn.microsoft.com/en-us/library/windows/desktop/aa379942(v=vs.85).aspx
Creating a separate HCRYPTPROV per thread and destroying it again would significantly complicate matters (and also risk more security-relevant bugs on my side), so this would be really useful to know. Sharing one global HCRYPTPROV would be a lot easier for sure.
So does anyone here know about the thread-safety of CryptGenRandom(), particularly with a single HCRYPTPROV instance?

Creating a separate HCRYPTPROV per thread doesn't make much sense. This is pointer to memory block from heap in all current implementations, primarily saved pointers to CSP entry points which used to call actual provider implementation (CPGenRandom in our case). The references themselves do not contain state of the CSP, unlike for example HCRYPTKEY which containing actual key state. So even if you create a separate HCRYPTPROV for every thread - this changes nothing.
There may be some global variables / data used by CSP internally during this call; this is however unknown as these would be implementation details. Of course we can serialize calls to CryptGenRandom in the code. However we cannot control that some other dll in our process also call CryptGenRandom concurrently. So serializing all calls to CryptGenRandom also impossible.
As result I think the CPGenRandom must be design to be thread-safe. and it my tests with a well known Microsoft CSP this is true. Internal synchronization is used in function, when need access global data and if multiple threads call CPGenRandom concurrently; every thread receives unique random data.
So my conclusion - CryptGenRandom is thread-safe, at least for all Microsoft CSP

is sprintf thread safe?

is sprintf thread safe ?
//Global log buffer
char logBuffer[20];
logStatus (char * status, int length)
{
snprintf(logBuffer, 19, status);
printf ("%s\n", logBuffer);
}
The thread safety of this function totally depends upon the thread safety of snprintf/sprintf .
Updates :
thanks for ur answers .
i dont mind, if the actual contents gts messed up. but want to confirm that the sprintf would not cause a memory corruption / buffer overflow going beyond 20 bytes in this case, when multiple threads are trying to write to logBuffer ?

There is no problem using snprintf() in multiple threads. But here you are writing to a shared string buffer, which I assume is shared across threads.
So your use of this function would not be thread safe.

Your question has an incorrect premise. Even if sprintf itself can be safely called from multiple threads at the same time (as I sure hope it can), your code is not protecting your global variable. The standard library can't possibly help you there.

You have several problems with your code.
Your usage of snprintf is very suspicious. Don't use it just to
copy a string. Generally don't pass dynamically allocated strings
with whatever content as format to any of the printf functions.
They interpret the contents and if there is anything in them that
resembles a %-format, you are doomed.
Don't use static buffers as you do. This is certainly neither
thread safe not re-entrant.
Either use printf with an appropriate format directly, or replace
the call by puts.
Then, Linux adheres to the POSIX standard, which requires that the standard IO functions are thread safe.

Regarding your update about not worrying if the logBuffer content get garbled:
I'm not sure why you want to avoid making your function completely thread safe by using a locally allocated buffer or some synchronization mechanism, but if you want to know what POSIX has to say about it, here you go (http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11):
Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads. [followed by a list of functions which provide synchronization]
So, POSIX says that your program needs to make sure mutilple threads won't be modifying logBuffer concurrently (or modifying logBuffer in one thread while reading it in another). If you don't hold to that, there's no promise made that the worst that will happen is garbled data in logBuffer. There's simply no promise made at all about what the results will be. I don't know if Linux might document a more specific behavior, but I doubt it does.

"There is no problem using snprintf() in multiple threads."
Not true.
Not true, at least in case of POSIX functions.
All of the standard vararg functions are not mt-safe - this includes all the printf() family (1), but also every other variadic function as well (2)
sprintf() for example is: "MT-Safe locale|AS-Unsafe heap|AC-Unsafe mem" - what means, that it can fail if locale is set asynchronously or if asynchronous cancellation of threads is used. In other words, special attention must be paid when using such functions in MT environment.
va_arg is not mt-safe: |MT-Safe race:ap|AS-Safe|AC-Unsafe corrupt| - what means, that inter-locking is needed.
Additionally, what should be obvious, even totally mt-safe function can be used in unsafe way - what happens for example if two or more threads are operating the same data/memory areas.

It's not thread safe, since the buffer where you sprintf is shared between all threads.

"Do you have a refernce which says that they are not thread safe? When I Google, it seems that they are"
My previous answer to this question has been removed/deleted (why?), so I'll try again, using different approach:
AC (async. cancellation of threads): this is obviously a case when almost all of the "apparently MT-safe" code can fail, simply because the thread is interrupted at a random point of time, so none of synchronization methods are guaranted to work correctly (i.e. any form of mutex can't be really guranteed to work correctly)
Threads can use the same malloc() arena, what means, that if one of the threads will fail (i.e. it'll damage the malloc arena) then all the consecutive calls to malloc() will/can cause critical errors - this of course depends on system configuration - but it also means, that nobody should assume that malformed memory (de)allocations are safe.
Since all of the systems are providing the option to use different local settings, it is obvious, that async. change to the "locale" settings can cause errors...
Regards.

What is a Re-entrant procedure?

What is a re entrant procedure and can you give an example scenario of when it is used?
Edit: Also, can multiple processes access a re entrant procedure in parallel?
Please provide a different way of explaining than wikipedia as I don't totally understand their description hence my question here

The idea behind re-entrancy is that the routine may be called while it is in the middle of executing already and it will still work right.
Generally this is achieved by it using only parameters and local variables declared on the stack (in C terms, no static locals). It would also be important that it not lock any global resources during execution.
Now, you may ask, "How would such a weird thing as a routine being run multiple times at once happen?" Well, some ways this could happen are:
The routine is recursive (or mutually-recursive with some other set of routines).
It gets called by another thread.
It gets called by an interrupt.
If any of these happen, and the routine is modifying a global (or C static local), then the new execution could potentially wipe out the changes the first execution made. As an example, if that global was used as a loop control variable, it might cause the first execution, when it finally gets to resume, to loop the wrong number of times.

It is a subroutine which can be called when it is already active. For instance recursive functions are often reentrant. Functions which are called from signal handlers must be reentrant as well. A reentrant function is thread-safe but not all thread-safe one are reentrant.

A reentrant procedure is one in which a single copy of the program code can be shared by multiple users during the same period of time. Re entrance has two key aspects: The program code cannot modify itself and the local data for each user must be stored separately.
In a shared system, reentrancy allows more efficient use of main memory: One copy of the program code is kept in main memory, but more than one application can call the procedure. Thus, a reentrant procedure must have a permanent part( the instructions that make up the procedure) and a temporary part(a pointer back to the calling program as well as memory for local variables used by the program).
Each execution instance, called activation, of a procedure will execute the code in the permanent part but must have its own copy of local variables and parameters. The temporary part associated with a particular activation is referred to as an activation record.
The most convenient way to support reentrant procedures is by means of a stack. When a reentrant procedure is called, the activation record becomes part of the stack frame that is created on procedure call

C : how pthread dataspecific works?

I'm not sure about how pthread dataspecific works : considering the next code (found on the web), does this means i can create for example 5 threads in the main, have a call to func in only some of them (let's say 2) those threads would have the data 'key' set to something (ptr = malloc(OBJECT_SIZE) ) and the other threads would have the same key existing but with a NULL value?
static pthread_key_t key;
static pthread_once_t key_once = PTHREAD_ONCE_INIT;
static void
make_key()
{
(void) pthread_key_create(&key, NULL);
}
func()
{
void *ptr;
(void) pthread_once(&key_once, make_key);
if ((ptr = pthread_getspecific(key)) == NULL) {
ptr = malloc(OBJECT_SIZE);
...
(void) pthread_setspecific(key, ptr);
}
...
}
Some explanation on how dataspecific works and how it may have been implemented in pthread (simple way) would be appreciated!

Your reasoning is correct. These calls are for thread-specific data. They're a way of giving each thread a "global" area where it can store what it needs, but only if it needs it.
The key is shared among all threads, since it's created with pthread_once() the first time it's needed, but the value given to that key is different for each thread (unless it remains set to NULL). By having the value a void* to a memory block, a thread that needs thread-specific data can allocate it and save the address for later use. And threads that don't call a routine that needs thread-specific data never waste memory since it's never allocated for them.
The one area where I have used them is to make a standard C library thread-safe. The strtok() function (as opposed to a thread-safe strtok_r() which was considered an abomination when we were doing this) in an implementation I was involved in used almost this exact same code the first time it was called, to allocate some memory which would be used by strtok() for storing information for subsequent calls. These subsequent calls would retrieve the thread-specific data to continue tokenizing the string without interfering with other threads doing the exact same thing.
It meant users of the library didn't have to worry about cross-talk between threads - they still had to ensure a single thread didn't call the function until the last one had finished but that's the same as with single-threaded code.
It allowed us to give a 'proper' C environment to each thread running in our system without the usual "you have to call these special non-standard re-entrant routines" limitations that other vendors imposed on their users.
As for implementation, from what I remember of DCE user-mode threads (which I think were the precursor to the current pthreads), each thread had a single structure which stored things like instruction pointers, stack pointers, register contents and so on. It was a very simple matter to add one pointer to this structure to achieve very powerful functionality with minimal cost. The pointer pointed to a array (linked list in some implementations) of key/pointer pairs so each thread could have multiple keys (e.g., one for strtok(), one for rand()).

The answer to your first question is yes. In simple terms, it allows each thread to allocate and save its own data. This is roughly equivalent to w/o each thread simply allocating and passing around its own data structure. The API saves you the trouble of passing the thread-local structure to all subfunctions, and allows you to look it up on demand instead.
The implementation really doesn't matter all that much (it may vary per-OS), as long as the results are the same.
You can think of it as a two-level hashmap. The key specifies which thread-local "variable" you want to access, and the second level might perform a thread-id lookup to request the per-thread value.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight