What happens if I declare all variables in my C code to be volatile?
As far as I know the implementation of how volatile is honored is implementation specific and the bottom-line is that the compiler cannot trust any copies of the variable it might have . Consequently, the program might be larger in size and run slower if I am to declare all variables as volatile.
Are there any other problems if I do that?
You should concern if you are developing drivers that read flags from control registers or pointing to that location.
Some of these register has special properties, as clearing or setting other flags just by reading them. Then using volatile would just destroy your code.
I don't think it is a good idea to declare all variables as volatile. Two of the reasons were already given: bigger code and running slower.
Worse than that is not thinking. You will be last and final professional who will look at some place and making proper programming to prevent running conditions to destroy your code. Declaring all as volatile will only postpone this to a bug you won't be able to track in the future.
Declare volatile for:
Shared variables
Optimizing your code via compilers (for old compilers... Nowadays they are pretty good already for not allowing bugs when optimizing.. But need to be explicit anyway)
Multithreading shared variables
Related
I'm writing some code on Linux using pthread multithreading library and I'm currently wondering if following code is safe when compiled with -Ofast -lto -pthread.
// shared global
long shared_event_count = 0;
// ...
pthread_mutex_lock(mutex);
while (shared_event_count <= *last_seen_event_count)
pthread_cond_wait(cond, mutex);
*last_seen_event_count = shared_event_count;
pthread_mutex_unlock(mutex);
Are the calls to pthread_* functions enough or should I also include memory barrier to make sure that the change to global variable shared_event_count is actually updated during the loop? Without memory barrier the compiler would be freely to optimize the variable as register integer only, right? Of course, I could declare the shared integer as volatile which would prevent keeping the contents of that variable in register only during the loop but if I used the variable multiple times within the loop, it could make sense to only check the fresh status for the loop conditition only because that could allow for more compiler optimizations.
From testing the above code as-is, it appears that the generated code actually sees the changes made by another thread. However, is there any spec or documentation that actually guarantees this?
The common solution seems to be "don't optimize multithreaded code too aggressively" but that seems like a poor man's workaround instead of really fixing the issue. I'd rather write correct code and let the compiler optimize as much as possible within the specs (any code that gets broken by optimizations is in reality using e.g. undefined behavior of C standard as assumed stable behavior, except for some rare cases where compiler actually outputs invalid code but that seems to be very very rare these days).
I'd much prefer writing the code that works with any optimizing compiler – as such, it should only use features specified in the C standard and the pthread library documentation.
I found an interesting article at https://www.alibabacloud.com/blog/597460 which contains a trick like this:
#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
This was actually used first in Linux kernel and it triggered a compiler bug in old GCC versions: https://lwn.net/Articles/624126/
As such, let's assume that the compiler is actually following the spec and doesn't contain a bug but implements every possible optimization known to man allowed by the specs. Is the above code safe with that assumption?
Also, does pthread_mutex_lock() include memory barrier by the spec or could compiler re-order the statements around it?
The compiler will not reorder memory accesses across pthread_mutex_lock() (this is an oversimplification and not strictly true, see below).
First I’ll justify this by talking about how compilers work, then I’ll justify this by looking at the spec, and then I’ll justify this by talking about convention.
I don’t think I can give you a perfect justification from the spec. In general, I would not expect a spec to give you a perfect justification—it’s turtles all the way down (do you have a spec for how to interpret the spec?), and the spec is designed to be read and understood by actual humans who understand the relevant background concepts.
How This Works
How this works—the compiler, by default, assumes that a function it doesn’t know can access any global variable. So it must emit the store to shared_event_count before the call to pthread_mutex_lock()—as far as the compiler knows, pthread_mutex_lock() reads the value of shared_event_count.
Inside pthread_mutex_lock is a memory fence for the CPU, if necessary.
Justification
From n1548:
In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
Yes, there’s LTO. LTO can do some very surprising things. However, the fact is that writing to shared_event_count does have side effects and those side effects do affect the behavior of pthread_mutex_lock() and pthread_mutex_unlock().
The POSIX spec states that pthread_mutex_lock() provides synchronization. I could not find an explanation in the POSIX spec of what synchronization is, so this may have to suffice.
POSIX 4.12
Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads. The following functions synchronize memory with respect to other threads:
Yes, in theory the store to shared_event_count could be moved or eliminated—but the compiler would have to somehow prove that this transformation is legal. There are various ways you could imagine this happening. For example, the compiler might be configured to do “whole program optimization”, and it may observe that shared_event_count is never read by your program—at which point, it’s a dead store and can be eliminated by the compiler.
Convention
This is how pthread_mutex_lock() has been used since the dawn of time. If compilers did this optimization, pretty much everyone’s code would break.
Volatile
I would generally not use this macro in ordinary code:
#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
This is a weird thing to do, useful in weird situations. Ordinary multithreaded code is not a sufficiently weird situation to use tricks like this. Generally, you want to either use locks or atomics to read or write shared values in a multithreaded program. ACCESS_ONCE does not use a lock and it does not use atomics—so, what purpose would you use it for? Why wouldn’t you use atomic_store() or atomic_load()?
In other words, it is unclear what you would be trying to do with volatile. The volatile keyword is easily the most abused keyword in C. It is rarely useful except when writing to memory-mapped IO registers.
Conclusion
The code is fine. Don’t use volatile.
My question is targeted towards the embedded development, specifically STM32.
I am well aware of the fact that the use of volatile qualifier for a variable is crucial when dealing with a program with interrupt service routines (ISR) in order to prevent the compiler optimising out a variable that is used in both the ISR and the main thread.
In Atollic TrueSTUDIO one can turn off the GCC optimisations with the -O0 flag before the compilation. The question is, whether it is absolutely necessary to use the volatile qualifier for variables that are used inside and outside the ISR, even when the optimisations are turned off like this.
With optimizations disabled it seems unlikely that you'd need volatile. However, the compiler can do trivial optimizations even at O0. For example it might remove parts of the code that it can deduct won't be used. So not using volatile will be a gamble. I see no reason why you shouldn't be using volatile, particularly not if you run with no optimizations on anyway.
Also, regardless of optimization level, variables may be pre-fetch cached on high end MCUs with data cache. Whether volatile solves/should solve this is debatable, however.
“Programs must be written for people to read, and only incidentally for machines to execute.”
I think here We can use this quote. Imagine a situation (as user253751 mentioned) you remove keyword volatile from every variable because there is optimization enabled. Then few months later you have to turn optimization on. Do you imagine what a disaster happened?
In addition, I work with code where there is an abstraction layer above bare-metal firmware and there we use volatile keyword when variable share memory space between those layers to be sure that we use exact proper value. So there this another usage of volatile not only in ISRs, that means there is not easy to change this back and be sure that everything works ok.
Debugging code where variable should be volatile is not so hard but bugs like this looks like something magic happened and you don't know why because for example something happened one in 10k execution of that part of code.
Summary: There is no strict "ban" for removing volatile keyword when optimization is turned off but for me is VERY bad programming practice.
I am well aware of the fact that the use of volatile qualifier for a variable is crucial when dealing with a program with interrupt service routines (ISR) in order to prevent the compiler optimising out a variable that is used in both the ISR and the main thread.
You should actually keep in mind that volatile is not a synchronization construct.
It does not force any barriers, and does not prevent reordering with other non-volatile variables. It only tells the compiler not to reorder the specific access relative to other volatile variables -- and even then gives no guarantees that the variables won't be reordered by the CPU.
That's why GCC will happily compile this:
volatile int value = 0;
int old_value = 0;
void swap(void)
{
old_value = value;
value = 100;
}
to something like:
// value will be changed before old_value
mov eax, value
mov value, 100
mov old_value, eax
So if your function uses a volatile to signal something like "everything else has been written up to this point", keep in mind that it might not be enough.
Additionally, if you are writing for a multi-core microcontroller, reordering of instructions done by the CPU will render the volatile keyword completely useless.
In other words, the correct approach for dealing with synchronization is to use what the C standard is telling you to use, i.e. the <stdatomic.h> library. Constructs provided by this library are guaranteed to be portable and emit all necessary compiler and CPU barriers.
I'm writing an embedded application and almost all of my RAM is used by global byte-arrays. When my firmware boots it starts by overwriting the whole BSS section in RAM with zeroes, which is completely unnecessary in my case.
Is there some way I can instruct the compiler that it doesn't need to zero-initialize certain arrays? I know this can also be solved by declaring them as pointers, and using malloc(), but there are several reasons I want to avoid that.
The problem is that standard C enforces zero initialization of static objects. If the compiler skips it, it wouldn't conform to the C standard.
On embedded systems compilers there is usually a non-standard option "compact startup" or similar. When enabled, no initialization of static/global objects will occur at all, anywhere in the program. How to do this depends on your compiler, or in this case, on your gcc port.
If you mention which system you are using, someone might be able to provide a solution for that particular compiler port.
This means that any static/global (static storage duration) variable that you initialize explicitly will no longer be initialized. You will have to initialize it in runtime, that is, instead of static int x=1; you will have to write static int x; x=1;. It is rather common to write embedded C programs in this manner, to make them compatible with compilers where the static initialization is disabled.
It turned out that the linker-script included in my toolchain has a special "noinit" section.
__attribute__ ((section (".noinit")))
/** Forces the compiler to not automatically zero the given global
variable on startup, so that the current RAM contents is retained.
Under most conditions this value will be random due to the
behaviour of volatile memory once power is removed, but may be used in some specific
circumstances, like the passing of values back after a system watchdog reset.
So all global variabeles marked with that attribute will not be zero-initialised during boot.
The C standard REQUIRES global data to be initialized to zero.
It is possible that SOME embedded system manufacturers provide a way to bypass this option, but there are certainly many typical applications that would simply fail if the "initialize to zero" wasn't done.
Some compilers also allow you to have further sections, which may have other characteristics than the 'bss' section.
The other alternative is of course to "make your own allocation". Since it's an embedded system, I suppose you have control over how the application and data is loaded into RAM, in particular, what addresses are used for that.
So, you could use a pointer, and simply use your own mechanism for assigning the pointer to a memory region that is reserved for whatever you need large arrays for. This avoids the rather complex usage of malloc - and it gives you a more or less permanent address, so you don't have to worry about trying to find where your data is later on. This will of course have a small effect on performance, since it adds another level of indirection, but in most cases, that disappears as soon as the array is used as an argument to a function, as it decays to a pointer at that point anyways.
There are a few workarounds like:
Deleting the BSS section from the binary or setting its size to 0 or 1. This will not work if the loader must explicitly allocate memory for all sections. This will work if the loader simply copies data to the RAM.
Declaring your arrays as extern in C code and defining the symbols (along with their addresses) either in assembly code in separate assembly files or in the linker script. Again, if memory must be explicitly allocated, this won't work.
Patching or removing the relevant BSS-zeroing code either in the loader or in the startup code that is executed in your program before main().
All embedded compilers should allow a noinit segment. With the IAR AVR compiler the variables you don't want to be initialised are simply declared as follows:
__no_init uint16_t foo;
The most useful reason for this is to allow variables to maintain their values over a watchdog or brown-out reset, which of course doesn't happen in computer-based C programs, hence its omission from standard C.
Just search you compiler manual for "noinit" or something similar.
Are you sure the binary format actually includes a BSS section in the binary? In the binary formats I've worked with BSS is simply a integer that tells the kernel/loader how much memory to allocate and zero out.
There definitely is no general way in C to get uninitialized global variables. This would be a function of your compiler/linker/runtime system and highly specific to that.
with gcc, -fno-zero-initialized-in-bss
I'm new to Linux kernel development.
One thing that bothers me is a way a variables are declared and initialized.
I'm under impression that code uses variable declaration placement rules for C89/ANSI C (variables are declared at the beginning of block), while C99 relaxes the rule.
My background is C++ and there many advises from "very clever people" to declare variable as locally as possible - better declare and initialize in the same instruction:
Google C++ Style Guide
C++ Coding Standards: 101 Rules, Guidelines, and Best Practices - item 18
A good discussion about it here.
What is the accepted way to initialize variables in Linux kernel?
I couldn't find a relevant passage in the Linux kernel coding style. So, follow the convention used in existing code -- declare variables at beginning of block -- or run the risk of your code seeming out-of-place.
Reasons why variables at beginning of block is a Good Thing:
the target architecture may not have a C99 compiler
... can't think of more reasons
You should always try to declare variables as locally as possible. If you're using C++ or C99, that would usually be right before the first use.
In older C, doing that doesn't fall under "possible", and there the place to declare those variables would usually be the beginning of the current block.
(I say 'usually' because of some cases with functions and loops where it's better to make them a bit more global...)
In most normal cases, declare them in the beginning of the function where you are using them. There are exceptions, but they are rare.
if your function is short enough, the deceleration is far away from the first use anyway. If your function is longer then that - it's a good sign your function is too long.
The reason many C++ based coding standards recommend declaring close to use is that C++ data types can be much "fatter" (e.g. thing of class with multiple inheritances etc.) and so take up a lot more space. If you define such an instance at the beginning of a function but use it only much later (and maybe not at all) you are wasting a lot of RAM. This tends to be much less of an issue in C with it's native type only data types.
There is an oblique reference in the Coding Style document. It says:
Another measure of the function is the number of local variables. They
shouldn't exceed 5-10, or you're doing something wrong. Re-think the
function, and split it into smaller pieces. A human brain can
generally easily keep track of about 7 different things, anything more
and it gets confused. You know you're brilliant, but maybe you'd like
to understand what you did 2 weeks from now.
So while C99 style in-place initialisers are handy in certain cases the first thing you should probably be asking yourself is why it's hard to have them all at the top of the function. This doesn't prevent you from declaring stuff inside block markers, for example for in-loop calculations.
In older C it is possible to declare them locally by creating a block inside the function. Blocks can be added even without ifs/for/while:
int foo(void)
{
int a;
int b;
....
a = 5 + b;
{
int c;
....
}
}
Although it doesn't look very neat, it still is possible, even in older C.
I can't speak to why they have done things one way in the Linux kernel, but in the systems we develop, we tend to not use C99-specific features in the core code. Individual applications tend to have stuff written for C99, because they will typically be deployed to one known platform, and the gcc C99 implementation is known good.
But the core code has to be deployable on whatever platform the customer demands (within reason). We have supplied systems on AIX, Solaris, Informix, Linux, Tru-64, OpenVMS(!) and the presence of C99 compliant compilers isn't always guaranteed.
The Linux kernel needs to be substantially more portable again - and particularly down to small footprint embedded systems. I guess the feature just isn't important enough to override these sorts of considerations.
im working on a c lib which would be nice to also work on embedded systems
but im not very deep into embedded development so my question
are most embedded compilers able to cope with local static variables - which i would then just assume in further development
OR
is there a #define which i can use for a #ifdef to create a global variable in case of
thx
They should, as local static variables are part of the C standard.
Of course, there is nothing preventing them from creating a C-like language that does not have all the features. But since that would be non-standard, then the way to identify that a feature is lacking would be non-standard as well.
Since static variables are part of the standard, you should be safe.
The problem with support is probably not to be found with your compiler (most of which handle the standard pretty well), but with whatever code you have to set up your runtime environment. Make sure that when you're loading the code that you properly unpack the executable, read-only data, read-write data, and zero-init sections of the executable before jumping into the C code.
Local static variables are part of th C standard, so yes.
\pedantic{
If your code is well organized, with separate files (compilation units) for different subsystems, you might do better to have a static variable with file scope. This will make it easier to factor the code that uses it into separate functions. If the code that uses the variable is complicated, this will permit you to split it into smaller static functions, which are easier to read, understand and debug.
}
Yes. local statics are really not much different than globals once the compiler is done chewing on your source code. I could think up exotic processors where globals would be an issue, but I doubt you will encounter many.
The truly interesting thing about globals on embedded processors is that you often have the option of having the compiler allocate them in ROM, EEPROMs, etc.