Per-thread state vs. shared state in threads - c

I'm trying to understand the details in the TCB (thread control block and the differences between per-thread states and shared states. My book has its own implementation of pthread, so it gives an example with this mini C program (I've not typed the whole thing out)
#include "thread.h"
static void go(int n);
static thread_t threads[NTHREADS];
#define NTHREADS 10
int main(int argh, char **argv) {
int i;
long exitValue;
for (i = 0; i < NTHREADS; i++) {
thread_create(&threads[i]), &go, i);
}
for (i = 0; i < NTHREADS; i++) {
exitValue = thread_join(threads[i]);
}
printf("Main thread done".\n);
return 0;
}
void go(int n) {
printf("Hello from thread %d\n", n);
thread_exit(100 + n);
}
What would the variables i and exitValue (in the main() function) be examples of? They're not shared state since they're not global variables, but I'm not sure if they're per-thread state either. The i is used as the parameter for the go function when each thread is being created, so I'm a bit confused about it. The exitValue's scope is limited only to main() so that seems like it would just be stored on the process' stack. The int n as the parameter for the void go() would be a per-thread variable because its value is independent for each thread. I don't think I fully understand these concepts so any help would be appreciated! Thanks!

Short Answer
All of the variables in your example program are automatic variables. Each time one of them comes into scope storage for it is allocated, and when it leaves its scope it is no longer valid. This concept is independent of whether the variables is shared or not.
Longer Answer
The scope of a variable refers to its lifetime in the program (and also the rules for how it can be accessed). In your program the variables i and exitValue are scoped to the main function. Typically a compiler will allocate space on the stack which is used to store the values for these variables.
The variable n in function go is a parameter to the function and so it also acts as a local variable in the function go. So each time go is executed the compiler will allocate space on the stack frame for the variables n (although the compiler may be able to perform optimization to keep the local variables in registers rather than actually allocating stack space). However, as a parameter n will be initialized with whatever value it was called with (its actual parameter).
To make this more concrete, here is what the values of the variales in the program would be after the first loop has completed 2 iterations (assuming that the spawned threads haven't finished executing).
Main thread: i = 2, exitValue = 0
Thread 0: n = 0
Thread 1: n = 1
The thing to note is that there are multiple independent copies of the variable n. And that n gets a copy of the value in i when thread_create is executed, but that the values of i and n are independent after that.
Finally I'm not certain what is supposed to happen with the statement exitValue = thread_join(threads[i]); since this is a variation of pthreads. But what probably happens is that it makes the value available when another thread calls thread_join. So in that way you do get some data sharing between threads, but the sharing is synchronized by the thread_join command.

They're objects with automatic storage, casually known as "local variables" although the latter is ambiguous since C and C++ both allow objects with local scope but that only have one global instance via the static keyword.

Related

Do all threads have the same global variable?

I have a general question that occured to me while trying to implement a thread sychronization problem with sempaphores. I do not want to get into much (unrelated) detail so I am going to give the code that I think is important to clarify my question.
sem_t *mysema;
violatile int counter;
struct my_info{
pthread_t t;
int id;
};
void *barrier (void *arg){
struct my_info *a = arg;
arg->id = thrid;
while(counter >0){
do_work(&mysem[thrid])
sem_wait(&mysema[third])
display_my_work(arg);
counter--;
sem_post(&mysema[thrid+1])
}
return NULL;
}
int main(int argc, char *argv[]){
int N = atoi(argv[1]);
mysema = mallon(N*(*mysema));
counter = 50;
/*semaphore intialisations */
for(i=0; i<M; i++){
sem_init(&mysema[i],0,0);
}
for(i=0; i<M; i++){
mysema[i].id = i;
}
for(i=0; i<M; i++){
pthread_create(&mysema.t[i], NULL, barrier, &tinfo[i])
}
/*init wake up the first sempahore */
sem_post(&mysema[0]);
.
.
.
We have an array made of M semaphores intialised in 0 , where M is defined in command line by the user.
I know I am done when all M threads in total have done some necessary computations 50 times.
Each thread blocks itself, until the previous thread "sem_post's" it. The very first thread will be waken up by init.
My question is whether the threads will stop when '''counter = 0 '''. Do they all see the same variable - counter? (It is a global one, initialised in the main).
If thread zero , makes the very first time ```counter = 49''' do all the other threads( thread 1, 2, ...M-1) see that ?
These are different questions:
Do [the threads] all see the same variable - counter? (It is a global one, initialised in the main).
If thread zero , makes the very first time ```counter = 49''' do all the other threads( thread 1, 2, ...M-1) see that ?
The first is fairly simple: yes. An object declared at file scope and without storage class specifier _Thread_local is a single object whose storage duration is the entire run of the program. Wherever that object's identifier is in-scope and visible, it identifies the same object regardless of which thread is accessing it.
The answer to the second question is more complicated. In a multi-threaded program there is the potential for data races, and the behavior of a program containing a data race is undefined. The volatile qualifier does not protect against these; instead, you need proper synchronization for all accesses to each shared variable, both reads and writes. This can be provided by a semaphore or more often a mutex, among other possibilities.
Your code's decrement of counter may be adequately protected, but I suspect not, on account of the threads using different semaphores. If this allows for multiple different threads to execute the ...
display_my_work(arg);
counter--;
... lines at the same time then you have a data race. Even if your protection is adequate there, however, the read of counter in the while condition clearly is not properly synchronized, and you definitely have a data race there.
One of the common manifestations of the undefined behavior brought on by data races is that threads do not see each others' updates, so not only does your program's undefined behavior generally mean that threads 1 ... M-1 may not see thread 0's update of counter, it also specifically makes such a failure comparatively probable.

If I declare a variable inside a for loop in C, will it be created multiple times or not?

#include <stdio.h>
int main()
{
for(int i=0;i<100;i++)
{
int count=0;
printf("%d ",++count);
}
return 0;
}
output of the above program is: 1 1 1 1 1 1..........1
Please take a look at the code above. I declared variable "int count=0" inside the for loop.
With my knowledge, the scope of the variable is within the block, so count variable will be alive up to for loop execution.
"int count=0" is executing 100 times, then it has to create the variable 100 times else it has to give the error (re-declaration of the count variable), but it's not happening like that — what may be the reason?
According to output the variable is initializing with zero every time.
Please help me to find the reason.
Such simple code can be visualised on http://www.pythontutor.com/c.html for easy understanding.
To answer your question, count gets destroyed when it goes outside its scope, that is the closing } of the loop. On next iteration, a variable of the same name is created and initialised to 0, which is used by the printf.
And if counting is your goal, print i instead of count.
The C standard describes the C language using an abstract model of a computer. In this model, count is created each time the body of the loop is executed, and it is destroyed when execution of the body ends. By “created” and “destroyed,” we mean that memory is reserved for it and is released, and that the initialization is performed with the reservation.
The C standard does not require compilers to implement this model slavishly. Most compilers will allocate a fixed amount of stack space when the routine starts, with space for count included in this fixed amount, and then count will use that same space in each iteration. Then, if we look at the assembly code generated, we will not see any reservation or release of memory; the stack will be grown and shrunk only once for the whole routine, not grown and shrunk in each loop iteration.
Thus, the answer is twofold:
In C’s abstract model of computing, a new lifetime of count begins and ends in each loop iteration.
In most actual implementations, memory is reserved just once for count, although implementations may also allocate and release memory in each iteration.
However, even if you know your C implementation allocates stack space just once per routine when it can, you should generally think about programs in the C model in this regard. Consider this:
for (int i = 0; i < 100; ++i)
{
int count = 0;
// Do some things with count.
float x = 0;
// Do some things with x.
}
In this code, the compiler might allocate four bytes of stack space to use for both count and x, to be used for one of them at a time. The routine would grow the stack once, when it starts, including four bytes to use for count and x. In each iteration of the loop, it would use the memory first for count and then for x. This lets us see that the memory is first reserved for count, then released, then reserved for x, then released, and then that repeats in each iteration. The reservations and releases occur conceptually even though there are no instructions to grow and shrink the stack.
Another illuminating example is:
for (int i = 0; i < 100; ++i)
{
extern int baz(void);
int a[baz()], b[baz()];
extern void bar(void *, void *);
bar(a, b);
}
In this case, the compiler cannot reserve memory for a and b when the routine starts because it does not know how much memory it will need. In each iteration, it must call baz to find how much memory is needed for a and how much for b, and then it must allocate stack space (or other memory) for them. Further, since the sizes may vary from iteration to iteration, it is not possible for both a and b to start in the same place in each iteration—one of them must move to make way for the other. So this code lets us see that a new a and a new b must be created in each iteration.
int count=0 is executing 100 times, then it has to create the variable 100 times
No, it defines the variable count once, then assigns it the value 0 100 times.
Defining a variable in C does not involve any particular step or code to "create" it (unlike for example in C++, where simply defining a variable may default-construct it). Variable definitions just associate the name with an "entity" that represents the variable internally, and definitions are tied to the scope where they appear.
Assigning a variable is a statement which gets executed during the normal program flow. It usually has "observable effects", otherwise the compiler is allowed to optimize it out entirely.
OP's example can be rewritten in a completely equivalent form as follows.
for(int i=0;i<100;i++)
{
int count; // definition of variable count - defined once in this {} scope
count=0; // assignment of value 0 to count - executed once per iteration, 100 times total
printf("%d ",++count);
}
Eric has it correct. In much shorter form:
Typically compilers determine at compile time how much memory is needed by a function and the offsets in the stack to those variables. The actual memory allocations occur on each function call and memory release on the function return.
Further, when you have variables nested within {curly braces} once execution leaves that brace set the compiler is free to reuse that memory for other variables in the function. There are two reasons I intentionally do this:
The variables are large but only needed for a short time so why make stacks larger than needed? Especially if you need several large temporary structures or arrays at different times. The smaller the scope the less chance of bugs.
If a variable only has a sane value for a limited amount of time, and would be dangerous or buggy to use out of that scope, add extra curly braces to limit the scope of access so improper use generates immediate compiler errors. Using unique names for each variable, even if the compiler doesn't insist on it, can help the debugger, and your mind, less confused.
Example:
your_function(int a)
{
{ // limit scope of stack_1
int stack_1 = 0;
for ( int ii = 0; ii < a; ++ii ) { // really limit scope of ii
stack_1 += some_calculation(i, a);
}
printf("ii=%d\n", ii); // scope error
printf("stack_1=%d\n", stack_1); // good
} // done with stack_1
{
int limited_scope_1[10000];
do_something(a,limited_scope_1);
}
{
float limited_scope_2[10000];
do_something_else(a,limited_scope_2);
}
}
A compiler given code like:
void do_something(int, int*);
...
for (int i=0; i<100; i++)
{
int const j=(i & 1);
doSomething(i, &j);
}
could legitimately replace it with:
void do_something(int, int*);
...
int const __compiler_generated_0 = 0;
int const __compiler_generated_1 = 1;
for (int i=0; i<100; i+=2)
{
doSomething(i, &compiler_generated_0);
doSomething(i+1, &compiler_generated_1);
}
Although a compiler would typically allocate space on the stack once for j, when the function was entered, and then not reuse the storage during the loop (or even the function), meaning that j would have the same address on every iteration of the loop, there is no requirement that the address remain constant. While there typically wouldn't be an advantage to having the address vary on different iterations, compilers are be allowed to exploit such situations should they arise.

Static vs. Malloc

What is the advantage of the static keyword in block scope vs. using malloc?
For example:
Function A:
f() {
static int x = 7;
}
Function B:
f() {
int *x = malloc(sizeof(int));
if (x != NULL)
*x = 7;
}
If I am understanding this correctly, both programs create an integer 7 that is stored on the heap. In A, the variable is created at the very beginning in some permanent storage, before the main method executes. In B, you are allocating the memory on the spot once the function is called and then storing a 7 where that pointer points. In what type of situations might you use one method over the other? I know that you cannot free the x in function A, so wouldn't that make B generally more preferable?
Both programs create an integer 7 that is stored on the heap
No, they don't.
static creates a object with static storage duration which remains alive throughout the lifetime of the program. While a dynamically allocated object(created by malloc) remains in memory until explicitly deleted by free. Both provide distinct functionality. static maintains the state of the object within function calls while dynamically allocated object does not.
In what type of situations might you use one method over the other?
You use static when you want the object to be alive throughout the lifetime of program and maintain its state within function calls. If you are working in a multithreaded environment the same static object will be shared for all the threads and hence would need synchronization.
You use malloc when you explicitly want to control the lifetime of the object.for e.g: Making sure the object lives long enough till caller of function accesses it after the function call.(An automatic/local object will be deallocated once the scope{ } of the function ends). Unless the caller explicitly calls free the allocated memory is leaked until the OS reclaims it at program exit.
In Function A, you're allocating x with static storage duration, which generally means it is not on (what most people recognize as) the heap. Rather, it's just memory that's guaranteed to exist the entire time your program is running.
In Function B, you're allocating the storage every time you enter the function, and then (unless there's a free you haven't shown) leaking that memory.
Given only those two choices, Function A is clearly preferable. It has shortcomings (especially in the face of multi-threading) but at least there are some circumstances under which it's correct. Function B (as it stands) is just plain wrong.
Forget stack v. heap. That is not the most important thing that is going on here.
Sometimes static modifies scope and sometimes it modifies lifetime. Prototypical example:
void int foo() {
static int count = 0;
return count++;
}
Try calling this repeatedly, perhaps from several different functions or files even, and you'll see that count keeps increasing, because in this case static gives the variable a lifetime equal to that of the entire execution of the program.
Read http://www.teigfam.net/oyvind/pub/notes/09_ANSI_C_static_vs_malloc.html
The static variable is created before main() and memory does not need to be allocated after running the program.
If I am understanding this correctly, both programs create an integer 7 that is stored on the heap
No, static variables are created in Data or BSS segment, and they have lifetime throughout the lifetime of the program. When you alloc using malloc(), memory is allocated in heap, and they must be explicitly freed using free() call.
In what type of situations might you use one method over the other?
Well, you use the first method, when you want access to the same variable for the multiple invocation of the same function. ie, in your example, x will only initialized once, and when you call the method for the second time, the same x variable is used.
Second method can be used, when you don't want to share the variable for multiple invocation of the function, so that this function is called for the second time, x is malloced again.
You must free x every time.
You can see the difference by calling f() 2 times, for each kind of f()
...
f();
f();
...
f(){
static int x = 7;
printf("x is : %d", x++);
}
f(){
int *x = malloc(sizeof(int));
if (x != NULL)
*x = 7;
printf("x is : %d", (*x)++);
free(x); //never forget this,
}
the results will be different
First things first , static is a storage class , and malloc() is an API , which triggers the brk() system call to allocate memory on the heap.
If I am understanding this correctly, both programs create an integer
7 that is stored on the heap ?
No.Static variables are stored in the data section of the memory allocated to the program. Even though if the scope of a static variable ends , it can still be accessed outside its scope , this may indicate that , the contents of data segment , has a lifetime independent of scope.
In what type of situations might you use one method over the other?
If you want more control , within a given scope ,over your memory use malloc()/free(), else the simpler (and more cleaner) way is to use static.
In terms of performance , declaring a variable static is much faster , than allocating it on the heap . since the algorithms for heap management is complex and the time needed to service a heap request varies depending on the type of algorithm
One more reason i can think of suggesting static is that , the static variables are by default initialized to zero , so one more less thing to worry about.
consider below exaple to understand how static works. Generally we use static keyword to define scope of variable or function. e.g. a variable defined as static will be restricted within the function and will retail its value.
But as shown in below sample program if you pass the reference of the static variable to any other function you can still update the same variable from any other function.
But precisely the static variable dies when the program terminates, it means the memory will be freed.
#include <stdio.h>
void f2(int *j)
{
(*j)++;
printf("%d\n", *j);
}
void f1()
{
static int i = 10;
printf("%d\n", i);
f2(&i);
printf("%d\n", i);
}
int main()
{
f1();
return 0;
}
But in case of malloc(), memory will not be freed on termination of the program unless and untill programmer takes care of freeing the memory using free() before termination of the program.
This way you will feel that using malloc() we can have control over variable lifespan but beware...you have to be very precise in allocating and freeing the memory when you choose dynamic memory allocation.
If you forget to free the memory and program terminated that part of heap cannot be used to allocate memory by other process. This will probably lead to starvation of memory in real world and slows down the computation. To come out of such situation you have to manually reboot the system.

thread parameter passing c

My question regards parameter passing to a thread.
I have a function Foo that operates on an array, say arrayA. To speed things up, Foo is coded to operate at both directions on the array. So, Foo takes arrayA and an integer X as parameters. Depending on the value of X, it operates in forward or reverse direction.
I'm looking to avoid making global use of "arrayA" and "X". So, I'm after passing "arrayA" and "X" as parameters to Foo, and creating two threads to run Foo-- one in each direction. Here's what I did:
typedef struct {int* arrayA[MSIZE]; int X; } TP; //arrayPack=TP
void Foo (void *tP) {
TP *tp = (TP*)tP; // cast the parameter tP back to what it is and assign to pointer *tp
int x;
printf("\nX: %d", tp->X);
printf("\n arrayA: "); for (x=0; x<tp->arrayA.size(); printf("%d ", aP->arrayA[x]), x++);
} // end Foo
void callingRouting () {
int* arrayA[MSIZE] = {3,5,7,9};
TP tp; tp.arrayA=arrayA;
tp.X=0; _beginthread(Foo, 0, (void*)&tp); // process -- forward
tp.X=1; _beginthread(Foo, 0, (void*)&tp); // process -- reverse
}
The values aren`t passed-- my array is printing empty and I'm not getting the value of X printed right. What am i missing ?
I would also appreciate suggestions on some readings on this-- passing parameters to threads-- especially on passing the resources shared by the threads. Thanks.
You're passing the address of a stack variable to your thread function, once callingRouting exits the TP structure no longer exists. They need to be either globals or allocated on the heap.
However you'll need two copies of the TP for each thread as the change tp.X=1 may be visible to both threads.
There are problems there but how you see them depends on how the OS decides to schedule the threads on each execution.
The first thing to remember is that you have a thread that is starting up two other threads. Since you do not have any control over the processor time slice and how that is allocated, you can not be sure when the two other threads will start and may be not even the order that they will start in.
Since you hare using an array on the stack that is local to the function callingRouting () as soon as that function returns the local variables allocated will basically be out of scope and can no longer be depended on.
So there are a couple of ways to do this.
The first is to use global or static memory variables for these data items being passed to the threads.
The other is to start both threads and then wait for both to complete before continuing.
Since you do not know when or the order of the threads being started, you really should use two different TP type variables, one for each thread. Otherwise you run the risk of the time slice allocation to be such that both threads will have the same TP data.

How can I make a parameter vector local for each Kernel in Cuda

How can i make a parameter vector treated as a local variable in each instance in cuda?
__global__ void kern(char *text, int N){
//if i change text[0]='0'; the change only affects the current instance of the kernel and not the other threads
}
Thanks!
Every thread will receive the same input parameters, so in this case char *text is the same in every thread - that's a fundamental part of the programming model. Since the pointer points to global memory, if one thread changes data through the pointer (i.e. modifies the global memory) then the change affects all threads (ignoring hazards).
This is exactly the same as standard C, except now you have multiple threads accessing through the pointer. In other words, if you modify text[0] inside a standard C function then the changes are visible outside the function.
If I understand correctly, you're asking for every thread to have a local copy of the contents of text. Well the solution is exactly the same as for standard C if you don't want changes visible outside the function:
__global__ void kern(char* text, int N) {
// If you have an upper bound for N...
char localtext[NMAX];
// If you don't know the range of N...
char *localtext;
localtext = malloc(N*sizeof(char));
// Copy from text to localtext
// Since each thread has the same value for i this will
// broadcast from the L1 cache
for (int i = 0 ; i < N ; i++)
localtext[i] = text[i];
//...
}
Note that I'm assuming you have sm_20 or later. Also note that while using malloc in device code is possible, you will pay a performance price.

Resources