Unable to understand following macro [duplicate] - c

This question already has answers here:
memory barrier and atomic_t on linux
(2 answers)
Closed 9 years ago.
I found below macro when i am going through kernel source code and I am unable to understand what it is doing.
#define barrer() __asm__ __volatile__("":::"memory")
Please some one clarify this.

This is a compiler memory barrier used to prevent the compiler from reordering instructions, if we look at the Wikipedia article on Memory ordering it says:
These barriers prevent a compiler from reordering instructions, they do not prevent reordering by CPU
The GNU inline assembler statement
asm volatile("" ::: "memory");
or even
__asm__ __volatile__ ("" ::: "memory");
forbids GCC compiler to reorder read and write commands around it.
You can find details of how this works in the Clobber List section of the GCC-Inline-Assembly-HOWTO and I quote:
[...]If our instruction modifies memory in an unpredictable fashion, add "memory" to the list of clobbered registers. This will cause GCC to not keep memory values cached in registers across the assembler instruction. We also have to add the volatile keyword if the memory affected is not listed in the inputs or outputs of the asm. [...]

It's gcc inline assembly. However there is no actual assembly in there (the first empty string) and only the specified side effects are relevant.
That is the "memory" clobber. It tells the compiler that the assembly accesses memory (and not just registers) and so the compiler must not reorder its own memory accesses across it to prevent reading old values or overwriting new values.
Thus it acts, as the macro name tells, as a compiler level memory barrier on the language level. It is not sufficient to prevent hardware based memory access reordering which would be necessary when DMA or other processors in a SMP machine would be involved.
The __volatile__ makes sure that the inline assembly is not optimized away or reordered with respect to other volatile statements. It is not strictly necessary since gcc assumes inline assembly without output to be volatile.
That is the implementation. Other memory barrier primitives and their documentation can be found in Documentation/memory-barriers.txt in the Linux kernel sources.

This macro doesn't "do" anything on a language level of C, but it does prevent the compiler from reordering code around this barrier.
If you have platform knowledge on how your generated code behaves in concurrent execution contexts, then you may be able to produce a correct program as long as you can prevent the compiler from changing the order of your instructions. This barrier is a building block in such platform-specific, concurrent code.
As an example, you might want to write some kind of lock-free queue, and you're relying on the fact that your architecture (x86?) already comes with a strongly orderered memory model, so your naive stores and loads imply sufficient synchronization, provided the emitted code follows the source code order. Pairing the platform guarantees with this compiler barrier allows you to end up with correct machine code (although it's of course undefined behaviour from the perspective of C).

Related

Why we need Clobbered registers list in Inline Assembly?

In my guide book it says:
In inline assembly, Clobbered registers list is used to tell the
compiler which registers we are using (So it can empty them before
that).
Which I totally don't understand, why the compiler needs to know so? what's the problem of leaving those registers as is? did they meant instead to back them up and restore them after the assembly code.
Hope someone can provide an example as I spent hours reading about Clobbered registers list with no clear answers to this problem.
The problems you'd see from failing to tell the compiler about registers you modify would be exactly the same as if you wrote a function in asm that modified some call-preserved registers1. See more explanation and a partial example in Why should certain registers be saved? What could go wrong if not?
In GNU inline-asm, all registers are assumed preserved, except for ones the compiler picks for "=r" / "+r" or other output operands. The compiler might be keeping a loop counter in any register, or anything else that it's going to read later and expect it to still have the value it put there before the instructions from the asm template. (With optimization disabled, the compiler won't keep variables in registers across statements, but it will when you use -O1 or higher.)
Same for all memory except for locations that are part of an "=m" or "+m" memory output operand. (Unless you use a "memory" clobber.) See How can I indicate that the memory *pointed* to by an inline ASM argument may be used? for more details.
Footnote 1:
Unlike for a function, you should not save/restore any registers with your own instructions inside the asm template. Just tell the compiler about it so it can save/restore at the start/end of the whole function after inlining, and avoid having any values it needs in them. In fact, in ABIs with a red-zone (like x86-64 System V) using push/pop inside the asm would be destructive: Using base pointer register in C++ inline asm
The design philosophy of GNU C inline asm is that it uses the same syntax as the compiler internal machine-description files. The standard use-case is for wrapping a single instruction, which is why you need early-clobber declarations if the asm code in the template string doesn't read all its inputs before it writes some registers.
The template is a black box to the compiler; it's up to you to accurately describe it to the optimizing compiler. Any mistake is effectively undefined behaviour, and leaves room for the compiler to mess up other variables in the surrounding code, potentially even in functions that call this one if you modify a call-preserved register that the compiler wasn't otherwise using.
That makes it impossible to verify correctness just by testing. You can't distinguish "correct" from "happens to work with this surrounding code and set of compiler options". This is one reason why you should avoid inline asm unless the benefits outweigh the downsides and risk of bugs. https://gcc.gnu.org/wiki/DontUseInlineAsm
GCC just does a string substitution into the template string, very much like printf, and sends the whole result (including the compiler-generated instructions for the pure C code) to the assembler as a single file. Have a look on https://godbolt.org/ sometime; even if you have invalid instructions in the inline asm, the compiler itself doesn't notice. Only when you actually assemble will there be a problem. ("binary" mode on the compiler-explorer site.)
See also https://stackoverflow.com/tags/inline-assembly/info for more links to guides.

Is it necessary to use the "volatile" qualifier even in case the GCC optimisations are turned off?

My question is targeted towards the embedded development, specifically STM32.
I am well aware of the fact that the use of volatile qualifier for a variable is crucial when dealing with a program with interrupt service routines (ISR) in order to prevent the compiler optimising out a variable that is used in both the ISR and the main thread.
In Atollic TrueSTUDIO one can turn off the GCC optimisations with the -O0 flag before the compilation. The question is, whether it is absolutely necessary to use the volatile qualifier for variables that are used inside and outside the ISR, even when the optimisations are turned off like this.
With optimizations disabled it seems unlikely that you'd need volatile. However, the compiler can do trivial optimizations even at O0. For example it might remove parts of the code that it can deduct won't be used. So not using volatile will be a gamble. I see no reason why you shouldn't be using volatile, particularly not if you run with no optimizations on anyway.
Also, regardless of optimization level, variables may be pre-fetch cached on high end MCUs with data cache. Whether volatile solves/should solve this is debatable, however.
“Programs must be written for people to read, and only incidentally for machines to execute.”
I think here We can use this quote. Imagine a situation (as user253751 mentioned) you remove keyword volatile from every variable because there is optimization enabled. Then few months later you have to turn optimization on. Do you imagine what a disaster happened?
In addition, I work with code where there is an abstraction layer above bare-metal firmware and there we use volatile keyword when variable share memory space between those layers to be sure that we use exact proper value. So there this another usage of volatile not only in ISRs, that means there is not easy to change this back and be sure that everything works ok.
Debugging code where variable should be volatile is not so hard but bugs like this looks like something magic happened and you don't know why because for example something happened one in 10k execution of that part of code.
Summary: There is no strict "ban" for removing volatile keyword when optimization is turned off but for me is VERY bad programming practice.
I am well aware of the fact that the use of volatile qualifier for a variable is crucial when dealing with a program with interrupt service routines (ISR) in order to prevent the compiler optimising out a variable that is used in both the ISR and the main thread.
You should actually keep in mind that volatile is not a synchronization construct.
It does not force any barriers, and does not prevent reordering with other non-volatile variables. It only tells the compiler not to reorder the specific access relative to other volatile variables -- and even then gives no guarantees that the variables won't be reordered by the CPU.
That's why GCC will happily compile this:
volatile int value = 0;
int old_value = 0;
void swap(void)
{
old_value = value;
value = 100;
}
to something like:
// value will be changed before old_value
mov eax, value
mov value, 100
mov old_value, eax
So if your function uses a volatile to signal something like "everything else has been written up to this point", keep in mind that it might not be enough.
Additionally, if you are writing for a multi-core microcontroller, reordering of instructions done by the CPU will render the volatile keyword completely useless.
In other words, the correct approach for dealing with synchronization is to use what the C standard is telling you to use, i.e. the <stdatomic.h> library. Constructs provided by this library are guaranteed to be portable and emit all necessary compiler and CPU barriers.

How do C developers work with assembly that's foreign to them?

I was looking through a C code snippet when i came across this line of assembly code:
char *buf = alloca(0x2000);
asm volatile("" :: "m" (buf));
I don't know what this means. In my investigation, i've learned that that there are many different types of assembly languages (e.g., MASM, NASM, GAS, etc.), and in my (very limited) experience, the author rarely specifies which one they're using.
What does this line mean; and more importantly, how do C developers (presumably not versed in assembly) research assembly code they come across in this manner?
The snippet is neither MASM, GAS, NASM, etc. It is inline assembly, and the syntax is documented in the C compiler's documentation.
The syntax is tricky even if you are already familiar with pure assembly because it has to specify how to connect the C part with the assembly part and vice-versa.
The statement asm volatile("" :: "m" (buf)); would typically be an empty bit of assembly (not a noop but an actual absence of instructions), with such binding instructions "m" that make the statement amount to a memory barrier from the point of view of the C compiler.
EDIT: a comment by StackOverflow user Jester below a now-deleted answer says that the purpose of the statement is more likely to prevent buf, and thus the alloca call, to be optimized out by the compiler by pretending that the assembly code "" reads from it.
I believe that the C11 standard offers cleaner ways to express memory barriers, but I haven't had the chance to investigate yet. Anyway, as a way to specify a memory barrier, the above can be a way to target “GCC and compilers that aim for GCC compatibility, even if slightly old” as a larger set of compilers than “C compilers correctly implementing all of the C11 standard”. Actually, the Wikipedia page on C11 cites asm volatile ("" : : : "memory"); as an example in the discussion of memory barriers.

volatile variable using in making application

I am new in this field. Previously I was doing microcontroller programming where I used volatile variables to avoid compiler optimization. But I never saw such volatile declaration before variable declaration. Does it mean compilation is done without any optimization in Arago build. Here I have two questions.
How can I enable different types of optimization during compilation like speed and space optimization in Angstrom build?
If it is already an optimized compilation, why do we not need volatile declarations?
If you are using gcc for compilation then add/modify CFLAGS
-O2 or -O3 to enable a bunch of generic performance optimisations.
Os to enable code size optimisations.
A long list of flags that control individual gcc compiler optimisation options is available here.
Most often volatile is used NOT for optimising the code, but to ensure validity of data.
The declaration of a variable as volatile tells the compiler that the variable can be modified at any time externally to the implementation by
the operating system
another thread of execution
-- interrupt routine
-- signal handler
underlying hardware
As the value of a volatile-qualified variable can change at any time, the actual variable must always be accessed whenever the variable is referenced in code.
This means the compiler cannot perform optimizations on the variable. Marking a variable volatile forces the compiler to generate code that ignores the variable in the CPU register and actually reads the underlying memory/hardware-register mapped at the address referred-to by the variable.
Also checkout the various aspects of using volatile along-with compiler optimisations.

Suppressing instruction reordering in the WindRiver (Diab) Compiler

I am searching for the proper and accepted way to inhibit instruction reordering in the WindRiver C compiler (AKA Diab C (?)). The problem is that I have to write hardware registers several times within the same function and I don't want the optimizer to reorder the sequence or worse, to collect multiple writes into one. Please do not recommend "volatile" as I don't want to rely on this invisible and unreliable prerequisite (mostly because the definition may not be under my control). I am currently using an empty inline assembler-statement:
asm volatile (" ");
as surrogate because the compiler docs say that this will prevent reordering, OTOH maybe there is a more common way which every decent user of WindRiver C should know of.
thanks in advance

Resources