I have been writing C code for many years, but I recently came accross a feature that I have never used: a static variable inside a function. Therefore, I was wondering what are some ways that you have used this feature and it was the right design decision.
E.g.
int count(){
static int n;
n = n + 1;
return n;
}
is a BAD design decision. why? because later you might want to decrement the count which would involve changing the function parameters, changing all calling code, ...
Hopefully this is clear enough,
thanks!
void first_call()
{
static int n = 0;
if(!n) {
/* do stuff here only on first call */
n++;
}
/* other stuff here */
}
I have used static variables in test code for lazy initialization of state. Using static local variables in production code is fraught with peril and can lead to subtle bugs. It seems (at least in the code that I generally work on) that nearly any bit of code that starts out as a single-threaded only chunk of code has a nasty habit of eventually ending up working in a concurrent situation. And using a static variable in a concurrent environment can result in difficult issues to debug. The reason for this is because the resulting state change is essentially a hidden side effect.
I have used static variables as a way to control the execution of another thread.
For instance, thread #1 (the main thread) first declares and initializes a control variable such as:
/* on thread #1 */
static bool run_thread = true;
// then initialize the worker thread
and then it starts the execution of thread #2, which is going to do some work until thread #1 decides to stop it:
/* thread #2 */
while (run_thread)
{
// work until thread #1 stops me
}
There is one prominent example that you very much need to be static for protecting critical sections, namely a mutex. As an example for POSIX threads:
static pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_lock(&mut);
/* critical code comes here */
pthread_mutex_unlock(&mut);
This wouldn't work with an auto variable.
POSIX has such static initialzers for mutexes, conditions and once variables.
Related
There are many resources that explain why using volatile alone is not enough for most multi threading applications.
However, is it good enough for simply signalling a thread to exit from another thread?
In the example below, the main thread starts a second thread and, after some time, it wants to stop it. There is no shared data between the threads and also no return code necessary from the thread, the thread is basically just used as a keep alive trigger for an attached hardware.
Example (thread_create and thread_join omitted for brevity, they are basically a wrapper around pthread_create and pthread_join or the Windows equivalent):
typedef struct {
volatile bool keepRunning;
} ThreadContext;
static void thread(void *arg) {
ThreadContext *context = (ThreadContext *)arg;
while (context->keepRunning) {
// do some fast operation
// ...
// then sleep before next iteration
msleep(100);
}
}
static int startThread(ThreadContext *context) {
context->keepRunning = true;
return thread_create(thread, context);
}
static void stopThread(ThreadContext *context) {
context->keepRunning = false;
thread_join();
}
static int main() {
ThreadContext context;
startThread(&context);
msleep(10000);
stopThread(&context);
}
EDIT: I have to keep Windows compatibility and cross compile to a few ancient toolchains, so C11 with stdatomic.h is out of the picture. If there is a way to use regular condition variables on Windows, this might be a way I could try to check if all my targets support them.
EDIT2: Also, in this scenario I don't care about the precise order of events, the thread should only stop eventually (after at most a few iterations) and not wait forever on the join, but it does not have to stop immediately after setting the flag, i.e. it does no harm if the thread does another few iterations.
On most sane architectures and with most sane compilers accessing a variable volatile bool will work. It is however not guaranteed by the standard.
The POSIX standard guarantees safe access to volatile sig_atomic_t type in asynchronous interrupts. To be safe use volatile sig_atomic_t. Or just use pthread_cond.
typedef struct {
volatile sig_atomic_t keepRunning;
} ThreadContext;
In C11 there is a way to check if accessing bool is atomic and safe from interrupt, you should if ATOMIC_BOOL_LOCK_FREE is defined in stdatomic.h.
Some references: C11 draft 5.1.2.3p5 and posix signal.h.
Each thread of my program has its own log file. In my SIGHUP handler I want to notify those threads that when a new log message arrives, they need to reopen their log files.
I want a lock-free solution based purely on flags and counters. (I do have a thread-local context structure for another purpose, so I can add new fields there).
If there was just one logging thread, I would do:
static int need_reopen = 0;
void sighancont(int signo)
...
case SIGHUP:
need_reopen = 1;
break;
...
}
void log(char *msg) {
if (need_reopen) {
need_reopen = 0;
reopen_log();
}
...
}
Of course, if there are multiple logging threads, a simple flag won't do. I'm thinking of something like this:
static volatile int reopen_counter = 0;
void sighancont(int signo)
...
case SIGHUP:
__sync_fetch_and_add(&reopen_counter, 1);
break;
...
}
void log(struct ctx_st *ctx, char *msg) {
int c = reopen_counter;
if (ctx->reopen_counter != c) {
ctx->reopen_counter = c;
reopen_log();
}
...
}
This way the logging threads are supposed to catch-up with the global counter. If the program receives SIGHUP multiple times, log files will be reopened only once.
I see the only way to break this - to send SIGHUP ~4 billion times.
Is there a better (but still simple) algorithm, e.g. with reference counting?
Your solution is simple and efficient. This is kind of a seqlock.
A few notes, to clear possible confusion from comments:
There's no "atomic variable" but atomic instruction. std::atomic, and friends, are just syntactic sugar around atomic ops - you're perfectly ok there.
The counter doesn't have to be volatile, but the accesses have to be. When you write atomic_read(x) you actually say *(volatile int*)&x.
The volatile qualifier causes all accesses to the variable to be done from memory, while you don't necessarily need that.
But, here as well, you're perfectly ok, since you read the variable into a local.
You can update the counter non-atomically if this is the one and only writer (don't forget to make it atomic_write if you remove the volatile). This would be a very small performance improvement.
The only cost here is in the log threads that have to pay for main memory access after the counter is updated. You should expect 200 cycles or so (x2 on other NUMA node)
I have this code:
#include<stdio.h>
#include<pthread.h>
int mutex=1,i=0,full=0;
void p(int *s)
{
while(*s<=0)
;
*s--;
}
void v(int *s)
{
*s++;
}
void *producer()
{
p(&mutex);
printf("Producer is producing\n");
v(&mutex);
v(&full);
}
void *consumer()
{
p(&full);
p(&mutex);
printf("Consuming\n");
v(&mutex);
}
int main()
{
pthread_t thread1,thread2;
int k;
for(k=0;k<10;k++)
{
pthread_create(&thread1,NULL,(void *(*)(void *))producer,NULL);
pthread_create(&thread2,NULL,(void *(*)(void *))consumer,NULL);
}
pthread_join(thread1,NULL);
pthread_join(thread2,NULL);
}
Before adding p(&full) in consumer function, this code was working fine, randomly selecting one out of two functions every time; but after adding p(&full) in consumer() function, every time it is executing producer() function. I don't understand the reason for this.
Can someone please help me,and suggest possible solution for this problem? I want that first time producer function should execute.
Inter-thread synchronisation via shared variables is almost certainly a bad idea, but even so the shared variables should at least be declared volatile.
Consider using real synchronisation primitives such as semaphores or real Pthreads mutexes.
Your use of the term mutex here is incorrect; it is not a mutex. A mutex should be locked and released in the same thread and is intended to prevent other threads accessing a resource. If that is not the behaviour hat you want then a mutex is the wrong primitive - perhaps you need a semaphore rather than a mutex.
The code is broken in too many ways to understand what is going on. These two issues pop to mind.
i-- and i++ are not atomic operations, so the neither mutex or full has the values you think they do.
You are creating 20 threads but only joining on the last two.
You have no memory barriers in the code so that order of how changes to memory in an SMP system is practically undefined.
Let's imagine that I have a few worker threads such as follows:
while (1) {
do_something();
if (flag_isset())
do_something_else();
}
We have a couple of helper functions for checking and setting a flag:
void flag_set() { global_flag = 1; }
void flag_clear() { global_flag = 0; }
int flag_isset() { return global_flag; }
Thus the threads keep calling do_something() in a busy-loop and in case some other thread sets global_flag the thread also calls do_something_else() (which could for example output progress or debugging information when requested by setting the flag from another thread).
My question is: Do I need to do something special to synchronize access to the global_flag? If yes, what exactly is the minimum work to do the synchronization in a portable way?
I have tried to figure this out by reading many articles but I am still not quite sure of the correct answer... I think it is one of the following:
A: No need to synchronize because setting or clearing the flag does not create race conditions:
We just need to define the flag as volatile to make sure that it is really read from the shared memory every time it is being checked:
volatile int global_flag;
It might not propagate to other CPU cores immediately but will sooner or later, guaranteed.
B: Full synchronization is needed to make sure that changes to the flag are propagated between threads:
Setting the shared flag in one CPU core does not necessarily make it seen by another core. We need to use a mutex to make sure that flag changes are always propagated by invalidating the corresponding cache lines on other CPUs. The code becomes as follows:
volatile int global_flag;
pthread_mutex_t flag_mutex;
void flag_set() { pthread_mutex_lock(flag_mutex); global_flag = 1; pthread_mutex_unlock(flag_mutex); }
void flag_clear() { pthread_mutex_lock(flag_mutex); global_flag = 0; pthread_mutex_unlock(flag_mutex); }
int flag_isset()
{
int rc;
pthread_mutex_lock(flag_mutex);
rc = global_flag;
pthread_mutex_unlock(flag_mutex);
return rc;
}
C: Synchronization is needed to make sure that changes to the flag are propagated between threads:
This is the same as B but instead of using a mutex on both sides (reader & writer) we set it in only in the writing side. Because the logic does not require synchronization. we just need to synchronize (invalidate other caches) when the flag is changed:
volatile int global_flag;
pthread_mutex_t flag_mutex;
void flag_set() { pthread_mutex_lock(flag_mutex); global_flag = 1; pthread_mutex_unlock(flag_mutex); }
void flag_clear() { pthread_mutex_lock(flag_mutex); global_flag = 0; pthread_mutex_unlock(flag_mutex); }
int flag_isset() { return global_flag; }
This would avoid continuously locking and unlocking the mutex when we know that the flag is rarely changed. We are just using a side-effect of Pthreads mutexes to make sure that the change is propagated.
So, which one?
I think A and B are the obvious choices, B being safer. But how about C?
If C is ok, is there some other way of forcing the flag change to be visible on all CPUs?
There is one somewhat related question: Does guarding a variable with a pthread mutex guarantee it's also not cached? ...but it does not really answer this.
The 'minimum amount of work' is an explicit memory barrier. The syntax depends on your compiler; on GCC you could do:
void flag_set() {
global_flag = 1;
__sync_synchronize(global_flag);
}
void flag_clear() {
global_flag = 0;
__sync_synchronize(global_flag);
}
int flag_isset() {
int val;
// Prevent the read from migrating backwards
__sync_synchronize(global_flag);
val = global_flag;
// and prevent it from being propagated forwards as well
__sync_synchronize(global_flag);
return val;
}
These memory barriers accomplish two important goals:
They force a compiler flush. Consider a loop like the following:
for (int i = 0; i < 1000000000; i++) {
flag_set(); // assume this is inlined
local_counter += i;
}
Without a barrier, a compiler might choose to optimize this to:
for (int i = 0; i < 1000000000; i++) {
local_counter += i;
}
flag_set();
Inserting a barrier forces the compiler to write the variable back immediately.
They force the CPU to order its writes and reads. This is not so much an issue with a single flag - most CPU architectures will eventually see a flag that's set without CPU-level barriers. However the order might change. If we have two flags, and on thread A:
// start with only flag A set
flag_set_B();
flag_clear_A();
And on thread B:
a = flag_isset_A();
b = flag_isset_B();
assert(a || b); // can be false!
Some CPU architectures allow these writes to be reordered; you may see both flags being false (ie, the flag A write got moved first). This can be a problem if a flag protects, say, a pointer being valid. Memory barriers force an ordering on writes to protect against these problems.
Note also that on some CPUs, it's possible to use 'acquire-release' barrier semantics to further reduce overhead. Such a distinction does not exist on x86, however, and would require inline assembly on GCC.
A good overview of what memory barriers are and why they are needed can be found in the Linux kernel documentation directory. Finally, note that this code is enough for a single flag, but if you want to synchronize against any other values as well, you must tread very carefully. A lock is usually the simplest way to do things.
You must not cause data race cases. It is undefined behavior and the compiler is allowed to do anything and everything it pleases.
A humorous blog on the topic: http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
Case 1: There is no synchronization on the flag, so anything is allowed to happen. For example, the compiler is allowed to turn
flag_set();
while(weArentBoredLoopingYet())
doSomethingVeryExpensive();
flag_clear()
into
while(weArentBoredLoopingYet())
doSomethingVeryExpensive();
flag_set();
flag_clear()
Note: this kind of race is actually very popular. Your millage may vary. One one hand, the de-facto implementation of pthread_call_once involves a data race like this. On the other hand, it is undefined behavior. On most versions of gcc, you can get away with it because gcc chooses not to exercise its right to optimize this way in many cases, but it is not "spec" code.
B: full synchronization is the right call. This is simply what you have to do.
C: Only synchronization on the writer could work, if you can prove that no one wants to read it while it is writing. The official definition of a data race (from the C++11 specification) is one thread writing to a variable while another thread can concurrently read or write the same variable. If your readers and writers all run at once, you still have a race case. However, if you can prove that the writer writes once, there is some synchronization, and then the readers all read, then the readers do not need synchronization.
As for caching, the rule is that a mutex lock/unlock synchronizes with all threads that lock/unlock the same mutex. This means you will not see any unusual caching effects (although under the hood, your processor can do spectacular things to make this run faster... it's just obliged to make it look like it wasn't doing anything special). If you don't synchronize, however, you get no guarantees that the other thread doesn't have changes to push that you need!
All of that being said, the question is really how much are you willing to rely on compiler specific behavior. If you want to write proper code, you need to do proper synchronization. If you are willing to rely on the compiler to be kind to you, you can get away with a lot less.
If you have C++11, the easy answer is to use atomic_flag, which is designed to do exactly what you want AND is designed to synchronize correctly for you in most cases.
For the example you have posted, case A is sufficient provided that ...
Getting and setting the flag takes only one CPU instruction.
do_something_else() is not dependent upon the flag being set during the execution of that routine.
If getting and/or setting the flag takes more than one CPU instruction, then you must some form of locking.
If do_something_else() is dependent upon the flag being set during the execution of that routine, then you must lock as in case C but the mutex must be locked before calling flag_isset().
Hope this helps.
Assigning incoming job to worker threads requires no locking. Typical example is webserver, where the request is catched by a main thread, and this main thread selects a worker. I'm trying explain it with some pesudo code.
main task {
// do forever
while (true)
// wait for job
while (x != null) {
sleep(some);
x = grabTheJob();
}
// select worker
bool found = false;
for (n = 0; n < NUM_OF_WORKERS; n++)
if (workerList[n].getFlag() != AVAILABLE) continue;
workerList[n].setJob(x);
workerList[n].setFlag(DO_IT_PLS);
found = true;
}
if (!found) panic("no free worker task! ouch!");
} // while forever
} // main task
worker task {
while (true) {
while (getFlag() != DO_IT_PLS) sleep(some);
setFlag(BUSY_DOING_THE_TASK);
/// do it really
setFlag(AVAILABLE);
} // while forever
} // worker task
So, if there are one flag, which one party sets is to A and another to B and C (the main task sets it to DO_IT_PLS, and the worker sets it to BUSY and AVAILABLE), there is no confilct. Play it with "real-life" example, say, when the teacher is giving different tasks to students. The teacher selects a student, gives him/her a task. Then, the teacher looks for next available student. When a student is ready, he/she gets back to the pool of available students.
UPDATE: just clarify, there are only one main() thread and several - configurable number of - worker threads. As main() runs only one instance, there is no need to sync the selection and launc of the workers.
I'm debugging a multi-threaded problem with C, pthread and Linux. On my MacOS 10.5.8, C2D, is runs fine, on my Linux computers (2-4 cores) it produces undesired outputs.
I'm not experienced, therefore I attached my code. It's rather simple: each new thread creates two more threads until a maximum is reached. So no big deal... as I thought until a couple of days ago.
Can I force single-core execution to prevent my bugs from occuring?
I profiled the programm execution, instrumenting with Valgrind:
valgrind --tool=drd --read-var-info=yes --trace-mutex=no ./threads
I get a couple of conflicts in the BSS segment - which are caused by my global structs and thread counter variales. However I could mitigate these conflicts with forced signle-core execution because I think the concurrent sheduling of my 2-4 core test-systems are responsible for my errors.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define MAX_THR 12
#define NEW_THR 2
int wait_time = 0; // log global wait time
int num_threads = 0; // how many threads there are
pthread_t threads[MAX_THR]; // global array to collect threads
pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER; // sync
struct thread_data
{
int nr; // nr of thread, serves as id
int time; // wait time from rand()
};
struct thread_data thread_data_array[MAX_THR+1];
void
*PrintHello(void *threadarg)
{
if(num_threads < MAX_THR){
// using the argument
pthread_mutex_lock(&mut);
struct thread_data *my_data;
my_data = (struct thread_data *) threadarg;
// updates
my_data->nr = num_threads;
my_data->time= rand() % 10 + 1;
printf("Hello World! It's me, thread #%d and sleep time is %d!\n",
my_data->nr,
my_data->time);
pthread_mutex_unlock(&mut);
// counter
long t = 0;
for(t = 0; t < NEW_THR; t++){
pthread_mutex_lock(&mut);
num_threads++;
wait_time += my_data->time;
pthread_mutex_unlock(&mut);
pthread_create(&threads[num_threads], NULL, PrintHello, &thread_data_array[num_threads]);
sleep(1);
}
printf("Bye from %d thread\n", my_data->nr);
pthread_exit(NULL);
}
return 0;
}
int
main (int argc, char *argv[])
{
long t = 0;
// srand(time(NULL));
if(num_threads < MAX_THR){
for(t = 0; t < NEW_THR; t++){
// -> 2 threads entry point
pthread_mutex_lock(&mut);
// rand time
thread_data_array[num_threads].time = rand() % 10 + 1;
// update global wait time variable
wait_time += thread_data_array[num_threads].time;
num_threads++;
pthread_mutex_unlock(&mut);
pthread_create(&threads[num_threads], NULL, PrintHello, &thread_data_array[num_threads]);
pthread_mutex_lock(&mut);
printf("In main: creating initial thread #%ld\n", t);
pthread_mutex_unlock(&mut);
}
}
for(t = 0; t < MAX_THR; t++){
pthread_join(threads[t], NULL);
}
printf("Bye from program, wait was %d\n", wait_time);
pthread_exit(NULL);
}
I hope that code isn't too bad. I didn't do too much C for a rather long time. :) The problem is:
printf("Bye from %d thread\n", my_data->nr);
my_data->nr sometimes resolves high integer values:
In main: creating initial thread #0
Hello World! It's me, thread #2 and sleep time is 8!
In main: creating initial thread #1
[...]
Hello World! It's me, thread #11 and sleep time is 8!
Bye from 9 thread
Bye from 5 thread
Bye from -1376900240 thread
[...]
I don't now more ways to profile and debug this.
If I debug this, it works - sometimes. Sometimes it doesn't :(
Thanks for reading this long question. :) I hope I didn't share too much of my currently unresolveable confusion.
Since this program seems to be just an exercise in using threads, with no actual goal, it is difficult to suggest how treat your problem rather than treat the symptom. I believe can actually pin a process or thread to a processor in Linux, but doing so for all threads removes most of the benefit of using threads, and I don't actually remember how to do it. Instead I'm going to talk about some things wrong with your program.
C compilers often make a lot of assumptions when they are doing optimizations. One of the assumptions is that unless the current code being examined looks like it might change some variable that variable does not change (this is a very rough approximation to this, and a more accurate explanation would take a very long time).
In this program you have variables which are shared and changed by different threads. If a variable is only read by threads (either const or effectively const after threads that look at it are created) then you don't have much to worry about (and in "read by threads" I'm including the main original thread) because since the variable doesn't change if the compiler only generates code to read that variable once (remembering it in a local temporary variable) or if it generates code to read it over and over the value is always the same so that calculations based on it always come out the same.
To force the compiler not do this you can use the volatile keyword. It is affixed to variable declarations just like the const keyword, and tells the compiler that the value of that variable can change at any instant, so reread it every time its value is needed, and rewrite it every time a new value for it is assigned.
NOTE that for pthread_mutex_t (and similar) variables you do not need volatile. It if were needed on the type(s) that make up pthread_mutex_t on your system volatile would have been used within the definition of pthread_mutex_t. Additionally the functions that access this type take the address of it and are specially written to do the right thing.
I'm sure now you are thinking that you know how to fix your program, but it is not that simple. You are doing math on a shared variable. Doing math on a variable using code like:
x = x + 1;
requires that you know the old value to generate the new value. If x is global then you have to conceptually load x into a register, add 1 to that register, and then store that value back into x. On a RISC processor you actually have to do all 3 of those instructions, and being 3 instructions I'm sure you can see how another thread accessing the same variable at nearly the same time could end up storing a new value in x just after we have read our value -- making our value old, so our calculation and the value we store will be wrong.
If you know any x86 assembly then you probably know that it has instructions that can do math on values in RAM (both getting from and storing the result in the same location in RAM all in one instruction). You might think that this instruction could be used for this operation on x86 systems, and you would almost be right. The problem is that this instruction is still executed in the steps that the RISC instruction would be executed in, and there are several opportunities for another processor to change this variable at the same time as we are doing our math on it. To get around this on x86 there is a lock prefix that may be applied to some x86 instructions, and I believe that glibc header files include atomic macro functions to do this on architectures that can support it, but this can't be done on all architectures.
To work right on all architectures you are going to need to:
int local_thread_count;
int create_a_thread;
pthread_mutex_lock(&count_lock);
local_thread_count = num_threads;
if (local_thread_count < MAX_THR) {
num_threads = local_thread_count + 1;
pthread_mutex_unlock(&count_lock);
thread_data_array[local_thread_count].nr = local_thread_count;
/* moved this into the creator
* since getting it in the
* child will likely get the
* wrong value. */
pthread_create(&threads[local_thread_count], NULL, PrintHello,
&thread_data_array[local_thread_count]);
} else {
pthread_mutex_unlock(&count_lock);
}
Now, since you would have changed the num_threads to volatile you can atomically test and increment the thread count in all threads. At the end of this local_thread_count should be usable as an index into the array of threads. Note that I did not create but 1 thread in this code, while yours was supposed to create several. I did this to make the example more clear, but it should not be too difficult to change it to go ahead and add NEW_THR to num_threads, but if NEW_THR is 2 and MAX_THR - num_threads is 1 (somehow) then you have to handle that correctly somehow.
Now, all of that being said, there may be another way to accomplish similar things by using semaphores. Semaphores are like mutexes, but they have a count associated with them. You would not get a value to use as the index into the array of threads (the function to read a semaphore count won't really give you this), but I thought that it deserved to be mentioned since it is very similar.
man 3 semaphore.h
will tell you a little bit about it.
num_threads should at least be marked volatile, and preferably marked atomic too (although I believe that the int is practically fine), so that at least there is a higher chance that the different threads are seeing the same values. You might want to view the assembler output to see when the writes of num_thread to memory are actually supposedly taking place.
https://computing.llnl.gov/tutorials/pthreads/#PassingArguments
that seems to be the problem. you need to malloc the thread_data struct.