HOW are local static variables thread unsafe in c? - c

So looking around the internet, I couldn't find consistent and helpful information about this. So here's the issue:
Why are local static variables in C said to be thread-unsafe? I mean, static local variables are stored in the data segment, which is shared by all threads, but isn't internal linkage supposed to stop threads from stepping in each other's static variables?
This forum post seems to suggest that threads do in fact step in each other's data segment occasionally, but wouldn't such behavior clearly violate all c standards since the 90'? If such behavor were to be expected, wouldn't use of the data segment (i.e. all variables with static storage duration, including global variables) have been made deprecated long ago in the successive c standards?
I really don't get this, since everyone seems to have something against local static variables, but people can't seem to agree on why, and researching some of the argument shows them to be ill-conceived.
I, for one, think local static variables are a very good way to communicate information between function calls, that can really improve readability and limit scope (compared to, say, passing the information as arguments forth and writing it back on each function call).
As far as I can see, there are completely legitimate uses of local static variables. But maybe I am missing something? I would really like to know if that were the case.
[EDIT]: The answers here were pretty helpful. Thanks to everyone for the insight.

but isn't internal linkage supposed to stop threads from stepping in each other's static variables?
No, linkage has nothing to do with thread safety. It merely restricts functions from accessing variables declared in other scopes, which is a different and unrelated matter.
Lets assume you have a function like this:
int do_stuff (void)
{
static int x=0;
...
return x++;
}
and then this function is called by multiple threads, thread 1 and thread 2. The thread callback functions cannot access x directly, because it has local scope. However, they can call do_stuff() and they can do so simultaneously. And then you will get scenarios like this:
Thread 1 has executed do_stuff until the point return 0 to caller.
Thread 1 is about to write value 1 to x, but before it does..:
Context switch, thread 2 steps in and executes do_stuff.
Thread 2 reads x, it is still 0, so it returns 0 to the caller and then increases x by 1.
x is now 1.
Thread 1 gets focus again. It was about to store 1 to x so that's what it does.
Now x is still 1, although if the program had behaved correctly, it should have been 2.
This gets even worse when the access to x is done in multiple instructions, so that one thread reads "half of x" and then gets interrupted.
This is a "race condition" and the solution here is to protect x with a mutex or similar protection mechanism. Doing so will make the function thread-safe. Alternatively, do_stuff can be rewritten to not use any static storage variables or similar resources - it would then be re-entrant.

isn't internal linkage supposed to stop threads from stepping in each other's static variables?
Linkage has nothing to do with concurrency: internal linkage stops translation units, not threads, from seeing each other's variables.
I, for one, think local static variables are a very good way to communicate information between function calls, that can really improve readability and limit scope
Communicating information between calls through static variables is not too different from communicating information through globals, for the same reasons: when you do that, your function becomes non-reentrant, severely limiting its uses.
The root cause of the problem is that read/write use of variables with static linkage transforms a function form stateless to stateful. Without static variables any state controlled by the function must be passed to it from the outside; static variables, on the other hand, let functions keep "hidden" state.
To see the consequences of keeping a hidden state, consider strtok function: you cannot use it concurrently, because multiple threads would step on each other's state. Moreover, you cannot use it even from a single thread if you wish to parse each token from a string that is currently being parsed, because your second-level invocation would interfere with your own top-level invocation.

From my point of view, the base is wrong, or at least, it is as unsafe as any other bad design.
A bad software practice (or thread unsafe) may be sharing resources without criteria or kind of protection (there are different and great ways for communication between threads, such as queues, mailboxs, etc, or semaphores and mutexs if the resource has to be shared), but this is developers' fault, because they are not using the proper mechanisms.
Actually I cannot see your point, a static local variable, whose scope is well defined (and even better, for embedded applications is useful to avoid memory overflows) and cannot be accessed out of that, so I guess there is no relation between unsafe code and static local variables (or at least, not in a general meaning).
If you are talking about a static local variable which can be written/read/.. from two different tasks without protection (through a callback or whatever), that is a horrible design (and again, developers' fault), but no because the static local variables are (generally) unsafe.

The behaviour of simultaneously reading from and writing to any non-atomic object is undefined in C.
A static variable makes the possibility of this happening substantially greater than an automatic or dynamic variable. And that is the crux of the problem.
So if you don't control your threading (using mutual exclusion units for example), you could put your program into an undefined state.
A sort of half-way-house; thread local storage is available with some C compilers, but it has not yet been incorporated into the C standard (cf. thread_local of C++11). See, for example, https://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/Thread-Local.html

isn't internal linkage supposed to stop threads from stepping in each other's static variables?
Your question is tagged c. There are no threads in the C programming language. If your program creates any new threads, it does so by calling in to some library at run-time. The C tool chain does not know what threads are, it has no way of knowing that the library routines you call create threads, and it has no way of knowing if you consider any particular static variable to be "owned" by one thread or another thread.
Every thread in your program runs in the same virtual address space as every other thread. Every thread potentially has access to all of the same variables that can be accessed by any other thread. If a variable in the program actually is used by more than one thread, it is the programmer's responsibility (not the tool chain's responsibility) to ensure that the threads use it in a safe way.
everyone seems to have something against local static variables,
Software developers who work in teams to develop large, long-lived software systems (think, tens of years and hundreds of thousands to tens of millions of lines of code) have some very well understood reasons to avoid using static variables. Not everyone works on systems like that, but you will meet some folk here who do.
people can't seem to agree on why
Not all software systems need to be maintained and upgraded for tens of years, and not all have tens of millions of lines of code. It's a big world. There are people out there writing code for many different reasons. They do not all have the same needs.
and researching some of the argument shows them to be ill-conceived
There are people out there writing code for many different reasons... What seems "ill-conceived" to you might be something that some other group of developers have thought long and hard about. Perhaps you do not fully understand their needs.
As far as I can see, there are completely legitimate uses of local static variables
Yes. That is why they exist. The C programming language, like many other programming languages, is a general tool that can be used in many different ways.

Related

Is it allowed to use the same static void function to use multiple threads in c?

I think that the question is pretty self-explanatory, however here is an example of what I am referring to. say we have
static void *foo(void * bar) {
//some random function/method/calculation/data manipulation.
}
is it safe/possible to create multiple threads and use that same function? I have a very lengthy file (approaching 1000 lines) Its starting to get lengthy to scroll up and down. long story short I can't afford errors or unintended functioning. Or would my better bet be to simply create another C file? my mutexing and everything is solid. so I'm not too worried.
int main() {
Is something like this feasable?:
pthread_t A1, A2;
pthread_creat(&A1, NULL, foo, &foobar);
pthread_create(&A2, NULL, foo, &foobar);
pthread_join(A1, NULL);
pthread_join(A2, NULL);
}
if i choose to head into this route any advice/precautions?
Code is constant in C, so there is no problem for multiple threads to use the same functions. What matters is the use of data by these functions: any access to shared modifiable data must be protected.
Note that some functions such as strtok() store their context in hidden static data and thus may not be thread safe.
In your example you pass the address of the same foobar object. Unless this object is constant throughout the lifetime of both threads, there would be concurrent access to shared modifiable data which would require special handling with locks or other forms of synchronisation.
A few thousand lines is not a lot of data, a single thread is a much simpler approach to your problem. Unless the processing of this data is very CPU intensive, multiple threads will create more problems for little benefit.
Proper thread programming is non-trivial to say the least. The answer to your question is yes, it is possible to pass the same function to different threads executing in parallel, but the devil is in the detail of how you deal with accessing shared data from those threads. Such a discussion far exceeds what can be explained in an answer. Without any information as to what data manipulation is performed, no clue can even be given as to how or even what to do.

code working fine without volatile?

Hi i wrote a code recently with following skeleton :
variable;
callback(){
//variable updated here
}
thread_function(){
//variable used
}
main(){
//callback registered
//thread registered
}
I found that whenever the variable is updated at callback, its automatically updated in the thread without declaring the variable as a volatile. well, I am not clear how it is managed. thank you in advance. between, callback() is called from a library being compiled with the code.
Ok, first of all, it is broken. You're just lucky your compiler created code that, by accident, seems to work (or maybe unlucky, because it could hide subtle bugs in corner cases).
I found that whenever the variable is updated at callback, its automatically updated in the thread without declaring the variable as a volatile. well, I am not clear how it is managed.
Starting from here: It is one and the same variable. Threads share the same address space. So, this is what you would naively expect.
volatile is about optimizations. A C compiler is free to do a lot of modifications to your code in order to make it faster, this includes reordering of statements, not reading an accessed variable because the same value is in a register, even "unrolling" loops and a lot more. Normally, the only restriction is that the observable behavior stays the same (google for it, I don't want to write a book here.)
Reading a variable from normal memory does not create any side effects, so it can be legally left out without changing behavior. C (prior to c11) does not know about threads. If a variable was not written to in some part of code, it is assumed to hold the same value as before.
Now, what volatile gives you is telling the compiler this is not normal memory but a location that could change outside of the control of your program (like e.g. a memory mapped I/O register). With volatile, the compiler is obliged to actually fetch the variable for any read operation and to actually store it for any write operation. It's also not allowed to reorder accesses between several volatile variables. But that's all and it is not enough for synchronizing threads. For that, you also need the guarantee that no other memory accesses or code execution is reordered around accesses of your variable, even with multi-processors -> you need a memory barrier.
Note this is different from e.g. java where volatile gives you what you need for thread synchronization. In c, use what the pthreads library gives you: mutexes, semaphores and condition variables.
tl;dr
Your code works correctly by accident. In c, volatile for threads is wrong (and unnecessary), use synchronization primitives as provided by pthreads.

Storing thread specific variables appropriately

I am in a process of restructuring an existing application code. One of the requirements of this restructuring is that I need to store a thread specific variable which would be referred pretty often for both read and write. I would be having approximately 50 such threads. The thread specific variable would basically be a pointer to a structure.
Here, I am not able to decide how exactly should I store this variable. Should I make it thread specific key which could be accessed by pthread_getspecific/pthread_setspecific? But I came across some posts which say that calls to these are pretty slow. Then another approach could be of having a global structure which store all these thread specific pointers in either a sorted array (to use binary search) or a hash table of elements in key-value form. Key would be mostly constant (thread_id) and value could be changed frequently. Again what would be the best approach here?
I know the fastest access to the required value would be to actually pass this pointer to each function and keep propagating it. But that would require a lot of code rewrite which I want to avoid.
Thanks in advance for you response.
If you are using the gcc toolchain (some other compilers as well), you have a third option. Use the __thread storage class specifier. This is very efficient. It works by isolating the thread local storage items into separate VM page(s), which get switched when a thread is scheduled. This way each thread is able to point to its own copy of the variables. The cost is just one operation per thread schedule, without the per key lookup cost for other approaches.
If your threads are static (that is, you launch them, and they do not exit unless the program is exiting), then you can simply use any mapping structure that you care about. The only trick is that the map needs to be populated before all the threads are allowed to run. So, you probably need a mutex and condition variable to block all the threads until the map is populated. After that, you can broadcast to all the waiting threads to go. Since the map will never change after that, each thread can read from it without any contention to retrieve their thread specific information.
If you are using GCC, then you can use a compiler specific extension. The __thread storage class extension places a global variable in a thread specific area, so that each thread has their own copy of that global.
__thread struct info_type *info;
Don't optimize prematurely, measure performance of the standard approach before you do anything. They shouldn't use more than some 100 clock cycles on average to provide you with the thread specific pointer. In many applications this is not much distinguishable from noise.
Then, I doubt that any portable solution that you can come with that goes through some sort of global variable or function can be faster than the POSIX functions. Basically they don't do much else than you propose, but are probably better optimized.
The best option that you have is to realize your data on the stack of each thread and pass a pointer to that data through to the functions that need it.
If you have a C11 compliant compiler (I think clang already implements that part) you can use the _Thread construct that provides you with exactly the type of variable that you want. Other compilers (pre-C11) have such features with extensions, e.g. the gcc family of compilers have it with __thread.
I don't understand. Is the structure meant to be thread specific? the one your pointer is pointing at?
If yes, then what is the problem in having a thread specific structure? if it is meant to be shared, (50 threads simultaneously!) you can have a global variable although synchronising might lead to problems as to which ones updating the value.
Why do you want a pointer to all thread-specific data?

Using C/Pthreads: do shared variables need to be volatile?

In the C programming language and Pthreads as the threading library; do variables/structures that are shared between threads need to be declared as volatile? Assuming that they might be protected by a lock or not (barriers perhaps).
Does the pthread POSIX standard have any say about this, is this compiler-dependent or neither?
Edit to add: Thanks for the great answers. But what if you're not using locks; what if you're using barriers for example? Or code that uses primitives such as compare-and-swap to directly and atomically modify a shared variable...
As long as you are using locks to control access to the variable, you do not need volatile on it. In fact, if you're putting volatile on any variable you're probably already wrong.
https://software.intel.com/en-us/blogs/2007/11/30/volatile-almost-useless-for-multi-threaded-programming/
The answer is absolutely, unequivocally, NO. You do not need to use 'volatile' in addition to proper synchronization primitives. Everything that needs to be done are done by these primitives.
The use of 'volatile' is neither necessary nor sufficient. It's not necessary because the proper synchronization primitives are sufficient. It's not sufficient because it only disables some optimizations, not all of the ones that might bite you. For example, it does not guarantee either atomicity or visibility on another CPU.
But unless you use volatile, the compiler is free to cache the shared data in a register for any length of time... if you want your data to be written to be predictably written to actual memory and not just cached in a register by the compiler at its discretion, you will need to mark it as volatile. Alternatively, if you only access the shared data after you have left a function modifying it, you might be fine. But I would suggest not relying on blind luck to make sure that values are written back from registers to memory.
Right, but even if you do use volatile, the CPU is free to cache the shared data in a write posting buffer for any length of time. The set of optimizations that can bite you is not precisely the same as the set of optimizations that 'volatile' disables. So if you use 'volatile', you are relying on blind luck.
On the other hand, if you use sychronization primitives with defined multi-threaded semantics, you are guaranteed that things will work. As a plus, you don't take the huge performance hit of 'volatile'. So why not do things that way?
I think one very important property of volatile is that it makes the variable be written to memory when modified, and reread from memory each time it accessed. The other answers here mix volatile and synchronization, and it is clear from some other answers than this that volatile is NOT a sync primitive (credit where credit is due).
But unless you use volatile, the compiler is free to cache the shared data in a register for any length of time... if you want your data to be written to be predictably written to actual memory and not just cached in a register by the compiler at its discretion, you will need to mark it as volatile. Alternatively, if you only access the shared data after you have left a function modifying it, you might be fine. But I would suggest not relying on blind luck to make sure that values are written back from registers to memory.
Especially on register-rich machines (i.e., not x86), variables can live for quite long periods in registers, and a good compiler can cache even parts of structures or entire structures in registers. So you should use volatile, but for performance, also copy values to local variables for computation and then do an explicit write-back. Essentially, using volatile efficiently means doing a bit of load-store thinking in your C code.
In any case, you positively have to use some kind of OS-level provided sync mechanism to create a correct program.
For an example of the weakness of volatile, see my Decker's algorithm example at http://jakob.engbloms.se/archives/65, which proves pretty well that volatile does not work to synchronize.
There is a widespread notion that the keyword volatile is good for multi-threaded programming.
Hans Boehm points out that there are only three portable uses for volatile:
volatile may be used to mark local variables in the same scope as a setjmp whose value should be preserved across a longjmp. It is unclear what fraction of such uses would be slowed down, since the atomicity and ordering constraints have no effect if there is no way to share the local variable in question. (It is even unclear what fraction of such uses would be slowed down by requiring all variables to be preserved across a longjmp, but that is a separate matter and is not considered here.)
volatile may be used when variables may be "externally modified", but the modification in fact is triggered synchronously by the thread itself, e.g. because the underlying memory is mapped at multiple locations.
A volatile sigatomic_t may be used to communicate with a signal handler in the same thread, in a restricted manner. One could consider weakening the requirements for the sigatomic_t case, but that seems rather counterintuitive.
If you are multi-threading for the sake of speed, slowing down code is definitely not what you want. For multi-threaded programming, there two key issues that volatile is often mistakenly thought to address:
atomicity
memory consistency, i.e. the order of a thread's operations as seen by another thread.
Let's deal with (1) first. Volatile does not guarantee atomic reads or writes. For example, a volatile read or write of a 129-bit structure is not going to be atomic on most modern hardware. A volatile read or write of a 32-bit int is atomic on most modern hardware, but volatile has nothing to do with it. It would likely be atomic without the volatile. The atomicity is at the whim of the compiler. There's nothing in the C or C++ standards that says it has to be atomic.
Now consider issue (2). Sometimes programmers think of volatile as turning off optimization of volatile accesses. That's largely true in practice. But that's only the volatile accesses, not the non-volatile ones. Consider this fragment:
volatile int Ready;
int Message[100];
void foo( int i ) {
Message[i/10] = 42;
Ready = 1;
}
It's trying to do something very reasonable in multi-threaded programming: write a message and then send it to another thread. The other thread will wait until Ready becomes non-zero and then read Message. Try compiling this with "gcc -O2 -S" using gcc 4.0, or icc. Both will do the store to Ready first, so it can be overlapped with the computation of i/10. The reordering is not a compiler bug. It's an aggressive optimizer doing its job.
You might think the solution is to mark all your memory references volatile. That's just plain silly. As the earlier quotes say, it will just slow down your code. Worst yet, it might not fix the problem. Even if the compiler does not reorder the references, the hardware might. In this example, x86 hardware will not reorder it. Neither will an Itanium(TM) processor, because Itanium compilers insert memory fences for volatile stores. That's a clever Itanium extension. But chips like Power(TM) will reorder. What you really need for ordering are memory fences, also called memory barriers. A memory fence prevents reordering of memory operations across the fence, or in some cases, prevents reordering in one direction.Volatile has nothing to do with memory fences.
So what's the solution for multi-threaded programming? Use a library or language extension that implements the atomic and fence semantics. When used as intended, the operations in the library will insert the right fences. Some examples:
POSIX threads
Windows(TM) threads
OpenMP
TBB
Based on article by Arch Robison (Intel)
In my experience, no; you just have to properly mutex yourself when you write to those values, or structure your program such that the threads will stop before they need to access data that depends on another thread's actions. My project, x264, uses this method; threads share an enormous amount of data but the vast majority of it doesn't need mutexes because its either read-only or a thread will wait for the data to become available and finalized before it needs to access it.
Now, if you have many threads that are all heavily interleaved in their operations (they depend on each others' output on a very fine-grained level), this may be a lot harder--in fact, in such a case I'd consider revisiting the threading model to see if it can possibly be done more cleanly with more separation between threads.
NO.
Volatile is only required when reading a memory location that can change independently of the CPU read/write commands. In the situation of threading, the CPU is in full control of read/writes to memory for each thread, therefore the compiler can assume the memory is coherent and optimizes the CPU instructions to reduce unnecessary memory access.
The primary usage for volatile is for accessing memory-mapped I/O. In this case, the underlying device can change the value of a memory location independently from CPU. If you do not use volatile under this condition, the CPU may use a previously cached memory value, instead of reading the newly updated value.
POSIX 7 guarantees that functions such as pthread_lock also synchronize memory
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11 "4.12 Memory Synchronization" says:
The following functions synchronize memory with respect to other threads:
pthread_barrier_wait()
pthread_cond_broadcast()
pthread_cond_signal()
pthread_cond_timedwait()
pthread_cond_wait()
pthread_create()
pthread_join()
pthread_mutex_lock()
pthread_mutex_timedlock()
pthread_mutex_trylock()
pthread_mutex_unlock()
pthread_spin_lock()
pthread_spin_trylock()
pthread_spin_unlock()
pthread_rwlock_rdlock()
pthread_rwlock_timedrdlock()
pthread_rwlock_timedwrlock()
pthread_rwlock_tryrdlock()
pthread_rwlock_trywrlock()
pthread_rwlock_unlock()
pthread_rwlock_wrlock()
sem_post()
sem_timedwait()
sem_trywait()
sem_wait()
semctl()
semop()
wait()
waitpid()
Therefore if your variable is guarded between pthread_mutex_lock and pthread_mutex_unlock then it does not need further synchronization as you might attempt to provide with volatile.
Related questions:
Does guarding a variable with a pthread mutex guarantee it's also not cached?
Does pthread_mutex_lock contains memory fence instruction?
Volatile would only be useful if you need absolutely no delay between when one thread writes something and another thread reads it. Without some sort of lock, though, you have no idea of when the other thread wrote the data, only that it's the most recent possible value.
For simple values (int and float in their various sizes) a mutex might be overkill if you don't need an explicit synch point. If you don't use a mutex or lock of some sort, you should declare the variable volatile. If you use a mutex you're all set.
For complicated types, you must use a mutex. Operations on them are non-atomic, so you could read a half-changed version without a mutex.
Volatile means that we have to go to memory to get or set this value. If you don't set volatile, the compiled code might store the data in a register for a long time.
What this means is that you should mark variables that you share between threads as volatile so that you don't have situations where one thread starts modifying the value but doesn't write its result before a second thread comes along and tries to read the value.
Volatile is a compiler hint that disables certain optimizations. The output assembly of the compiler might have been safe without it but you should always use it for shared values.
This is especially important if you are NOT using the expensive thread sync objects provided by your system - you might for example have a data structure where you can keep it valid with a series of atomic changes. Many stacks that do not allocate memory are examples of such data structures, because you can add a value to the stack then move the end pointer or remove a value from the stack after moving the end pointer. When implementing such a structure, volatile becomes crucial to ensure that your atomic instructions are actually atomic.
The underlying reason is that the C language semantic is based upon a single-threaded abstract machine. And the compiler is within its own right to transform the program as long as the program's 'observable behaviors' on the abstract machine stay unchanged. It can merge adjacent or overlapping memory accesses, redo a memory access multiple times (upon register spilling for example), or simply discard a memory access, if it thinks the program's behaviors, when executed in a single thread, doesn't change. Therefore as you may suspect, the behaviors do change if the program is actually supposed to be executing in a multi-threaded way.
As Paul Mckenney pointed out in a famous Linux kernel document:
It _must_not_ be assumed that the compiler will do what you want
with memory references that are not protected by READ_ONCE() and
WRITE_ONCE(). Without them, the compiler is within its rights to
do all sorts of "creative" transformations, which are covered in
the COMPILER BARRIER section.
READ_ONCE() and WRITE_ONCE() are defined as volatile casts on referenced variables. Thus:
int y;
int x = READ_ONCE(y);
is equivalent to:
int y;
int x = *(volatile int *)&y;
So, unless you make a 'volatile' access, you are not assured that the access happens exactly once, no matter what synchronization mechanism you are using. Calling an external function (pthread_mutex_lock for example) may force the compiler do memory accesses to global variables. But this happens only when the compiler fails to figure out whether the external function changes these global variables or not. Modern compilers employing sophisticated inter-procedure analysis and link-time optimization make this trick simply useless.
In summary, you should mark variables shared by multiple threads volatile or access them using volatile casts.
As Paul McKenney has also pointed out:
I have seen the glint in their eyes when they discuss optimization techniques that you would not want your children to know about!
But see what happens to C11/C++11.
Some people obviously are assuming that the compiler treats the synchronization calls as memory barriers. "Casey" is assuming there is exactly one CPU.
If the sync primitives are external functions and the symbols in question are visible outside the compilation unit (global names, exported pointer, exported function that may modify them) then the compiler will treat them -- or any other external function call -- as a memory fence with respect to all externally visible objects.
Otherwise, you are on your own. And volatile may be the best tool available for making the compiler produce correct, fast code. It generally won't be portable though, when you need volatile and what it actually does for you depends a lot on the system and compiler.
No.
First, volatile is not necessary. There are numerous other operations that provide guaranteed multithreaded semantics that don't use volatile. These include atomic operations, mutexes, and so on.
Second, volatile is not sufficient. The C standard does not provide any guarantees about multithreaded behavior for variables declared volatile.
So being neither necessary nor sufficient, there's not much point in using it.
One exception would be particular platforms (such as Visual Studio) where it does have documented multithreaded semantics.
Variables that are shared among threads should be declared 'volatile'. This tells the
compiler that when one thread writes to such variables, the write should be to memory
(as opposed to a register).

When is it ok to use a global variable in C?

Apparently there's a lot of variety in opinions out there, ranging from, "Never! Always encapsulate (even if it's with a mere macro!)" to "It's no big deal – use them when it's more convenient than not."
So.
Specific, concrete reasons (preferably with an example)
Why global variables are dangerous
When global variables should be used in place of alternatives
What alternatives exist for those that are tempted to use global variables inappropriately
While this is subjective, I will pick one answer (that to me best represents the love/hate relationship every developer should have with globals) and the community will vote theirs to just below.
I believe it's important for newbies to have this sort of reference, but please don't clutter it up if another answer exists that's substantially similar to yours – add a comment or edit someone else's answer.
Variables should always have a smaller scope possible. The argument behind that is that every time you increase the scope, you have more code that potentially modifies the variable, thus more complexity is induced in the solution.
It is thus clear that avoiding using global variables is preferred if the design and implementation naturally allow that. Due to this, I prefer not to use global variables unless they are really needed.
I can not agree with the 'never' statement either. Like any other concept, global variables are something that should be used only when needed. I would rather use global variables than using some artificial constructs (like passing pointers around), which would only mask the real intent.
Some good examples where global variables are used are singleton pattern implementations or register access in embedded systems.
On how to actually detect excessive usages of global variables: inspection, inspection, inspection. Whenever I see a global variable I have to ask myself: Is that REALLY needed at a global scope?
The only way you can make global variables work is to give them names that assure they're unique.
That name usually has a prefix associated some some "module" or collection of functions for which the global variable is particularly focused or meaningful.
This means that the variable "belongs" to those functions -- it's part of them. Indeed, the global can usually be "wrapped" with a little function that goes along with the other functions -- in the same .h file same name prefix.
Bonus.
When you do that, suddenly, it isn't really global any more. It's now part of some module of related functions.
This can always be done. With a little thinking every formerly global variable can be assigned to some collection of functions, allocated to a specific .h file, and isolated with functions that allow you to change the variable without breaking anything.
Rather than say "never use global variables", you can say "assign the global variable's responsibilities to some module where it makes the most sense."
Global variables in C are useful to make code more readable if a variable is required by multiple methods (rather than passing the variable into each method). However, they are dangerous because all locations have the ability to modify that variable, making it potentially difficult to track down bugs. If you must use a global variable, always ensure it is only modified directly by one method and have all other callers use that method. This will make it much easier to debug issues relating to changes in that variable.
Consider this koan: "if the scope is narrow enough, everything is global".
It is still very possible in this age to need to write a very quick utility program to do a one-time job.
In such cases, the energy required to create safe access to variables is greater than the energy saved by debugging problems in such a small utility.
This is the only case I can think of offhand where global variables are wise, and it is relatively rare. Useful, novel programs so small they can be held completely within the brain's short-term memory are increasingly infrequent, but they still exist.
In fact, I could boldly claim that if the program is not this small, then global variables should be illegal.
If the variable will never change, then it is a constant, not a variable.
If the variable requires universal access, then two subroutines should exist for getting and setting it, and they should be synchronized.
If the program starts small, and might be larger later, then code as if the program is large today, and abolish global variables. Not all programs will grow! (Although of course, that assumes the programmer is willing to throw away code at times.)
When you're not worried about thread-safe code: use them wherever it makes sense, in other words wherever it makes sense to express something as a global state.
When your code may be multi-threaded: avoid at all costs. Abstract global variables into work queues or some other thread-safe structure, or if absolutely necessary wrap them in locks, keeping in mind that these are likely bottlenecks in the program.
I came from the "never" camp, until I started working in the defense industry. There are some industry standards that require software to use global variables instead of dynamic (malloc in the C case) memory. I'm having to rethink my approach to dynamic memory allocation for some of the projects that I work on. If you can protect "global" memory with the appropriate semaphores, threads, etc. then this can be an acceptable approach to your memory management.
Code complexity is not the only optimization of concern. For many applications, performance optimization has a far greater priority. But more importantly, use of global variables can drastically REDUCE code complexity in many situations. There are many, perhaps specialized, situations in which global variables are not only an acceptable solution, but preferred. My favorite specialized example is their use to provide communication between the main thread of an application with an audio callback function running in a real-time thread.
It is misleading to suggest that global variables are a liability in multi-threaded applications as ANY variable, regardless of scope, is a potential liability if it is exposed to change on more than one thread.
Use global variables sparingly. Data structures should be used whenever possible to organize and isolate the use of the global namespace.
Variable scope avails programmers very useful protection -- but it can have a cost. I came to write about global variables tonight because I am an experienced Objective-C programmer who often gets frustrated with the barriers object-orientation places on data access. I would argue that anti-global zealotry comes mostly from younger, theory-steeped programmers experienced principally with object-oriented APIs in isolation without a deep, practical experience of system level APIs and their interaction in application development. But I have to admit that I get frustrated when vendors use the namespace sloppily. Several linux distros had "PI" and "TWOPI" predefined globally, for example, which broke much of my personal code.
When Not to Use: Global variables are dangerous because the only way to ever know how the global variable changed is to trace the entire source code within the .c file within which they are declared (or, all .c files if it is extern as well). If your code goes buggy, you have to search your entire source file(s) to see which functions change it, and when. It is a nightmare to debug when it goes wrong. We often take for granted the ingenuity behind the concept of local variables gracefully going out of scope - it's easy to trace
When to Use: Global variables should be used when its utilization is not excessively masked and where the cost of using local variables is excessively complex to the point where it compromises readability. By this, I mean the necessary of having to add an additional parameter to function arguments and returns and passing pointers around, amongst other things. Three classic examples: When I use the pop and push stack - this is shared between functions. Of-course I could use local variables but then I would have to pass pointers around as an additional parameter. Second classic example can be found in K&R's "The C Programming Language" where they define a getch() and ungetch() functions which share a global character buffer array. Once again, we don't need to make it global, but is the added complexity worth it when its pretty hard to mess up the use of the buffer? Third example is something you'll find in the embedded space amongst Arduino hobbyists. Alot of functions within the main loop function all share the millis() function which is the instantaneous time of when the function is invoked. Because clock speed isn't infinite, the millis() will differ within a single loop. To make it constant, take a snapshot of time prior to every loop and save it in a global variable. The time snapshot will now be the same as when accessed by the many functions.
Alternatives: Not much. Stick to local scoping as much as possible, especially in the beginning of the project, rather than vice versa. As the project grow's and if you feel complexity can be lowered using global variables, then do so, but only if it meets the requirements of point two. And remember, using local scope and having more complicated code is the lesser evil compared to irresponsibly using global variables.
You need to consider in what context the global variable will be used as well. In the future will you want this code to duplicate.
For example if you are using a socket within the system to access a resource. In the future will you want to access more than one of these resources, if the answer is yes I would stay away from globals in the first place so a major refactor will not be required.
Global variables should be used when multiple functions need to access the data or write to an object. For example, if you had to pass data or a reference to multiple functions such as a single log file, a connection pool, or a hardware reference that needs to be accessed across the application. This prevents very long function declarations and large allocations of duplicated data.
You should typically not use global variables unless absolutely necessary because global variables are only cleaned up when explicitly told to do so or your program ends. If you are running a multi-threaded application, multiple functions can write to the variable at the same time. If you have a bug, tracking that bug down can be more difficult because you don't know which function is changing the variable. You also run into the problem of naming conflicts unless you use a naming convention that explicitly gives global variables a unique name.
It's a tool like any other usually overused but I don't think they are evil.
For example I have a program that really acts like an online database. The data is stored in memory but other programs can manipulate it. There are internal routines that act much like stored procedures and triggers in a database.
This program has a hundreds of global variables but if you think about it what is a database but a huge number of global variables.
This program has been in use for about ten years now through many versions and it's never been a problem and I'd do it again in a minute.
I will admit that in this case the global vars are objects that have methods used for changing the object's state. So tracking down who changed the object while debugging isn't a problem since I can always set a break point on the routine that changes the object's state. Or even simpler I just turn on the built in logging that logs the changes.
When you declare constants.
I can think of several reasons:
debugging/testing purposes (warning - haven't tested this code):
#include <stdio.h>
#define MAX_INPUT 46
int runs=0;
int fib1(int n){
++runs;
return n>2?fib1(n-1)+fib1(n-2):1;
};
int fib2(int n,int *cache,int *len){
++runs;
if(n<=2){
if(*len==2)
return 1;
*len=2;
return cache[0]=cache[1]=1;
}else if(*len>=n)
return cache[n-1];
else{
if(*len!=n-1)
fib2(n-1,cache,len);
*len=n;
return cache[n-1]=cache[n-2]+cache[n-3];
};
};
int main(){
int n;
int cache[MAX_INPUT];
int len=0;
scanf("%i",&n);
if(!n||n>MAX_INPUT)
return 0;
printf("fib1(%i)==%i",n,fib1(n));
printf(", %i run(s)\n",runs);
runs=0;
printf("fib2(%i)==%i",n,fib2(n,&cache,&len));
printf(", %i run(s)\n",runs);
main();
};
I used scoped variables for fib2, but that's one more scenario where globals might be useful (pure mathematical functions which need to store data to avoid taking forever).
programs used only once (eg for a contest), or when development time needs to be shortened
globals are useful as typed constants, where a function somewhere requires *int instead of int.
I generally avoid globals if I intend to use the program for more than a day.
I believe we have an edge case in our firm, which prevents me from entering the "never use global variables camp".
We need to write an embedded application which works in our box, that pulls medical data from devices in hospital.
That should run infinitely, even when medical device is plugged off, network is gone, or settings of our box changes. Settings are read from a .txt file, which can be changed during runtime with preferably no trouble.
That is why Singleton pattern is no use to me. So we go back from time to time (after 1000 data is read) and read settings like so:
public static SettingForIncubator settings;
public static void main(String[] args) {
while(true){
SettingsForIncubator settings = getSettings(args);
int counter=0;
while(medicalDeviceIsGivingData && counter < 1000){
readData(); //using settings
//a lot of of other functions that use settings.
counter++;
}
}
}
Global constants are useful - you get more type safety than pre-processor macros and it's still just as easy to change the value if you decide you need to.
Global variables have some uses, for example if the operation of many parts of a program depend on a particular state in the state machine. As long as you limit the number of places that can MODIFY the variable tracking down bugs involving it isn't too bad.
Global variables become dangerous almost as soon as you create more than one thread. In that case you really should limit the scope to (at most) a file global (by declaring it static) variable and getter/setter methods that protect it from multiple access where that could be dangerous.
I'm in the "never" camp here; if you need a global variable, at least use a singleton pattern. That way, you reap the benefits of lazy instantiation, and you don't clutter up the global namespace.

Resources