gcc exception handler modify return address - c

I'm currently writing some bare-metal code for, among others, armv6-m using (arm-none-eabi-)gcc as compiler.
When implementing the exception-handlers I stumbled upon __attribute__((interrupt("type"))) (manual) telling gcc to generate a function that preserves all registers (apart from banked ones).
The problem is that this generated function always (more or less) returns execution to wherever it was supposed to be before the interrupt. While desirable for regular interrupts, this is exactly not what you want when dealing with e.g. undefined instruction exceptions, as you then spin on said undefined instruction. While I can find a macro that is supposed to get me the return address, I can't find one to set or modify it. This seems like an obvious thing to be included with e.g. the type "undef" as returning to the old pc is basically guaranteed to just retrigger the exception.
TLDR: Is there some way to modify the return address of an interrupt handler or a general c function in gcc?
And please don't tell me to just write an assembly wrapper, I know that would fix it & already have a few of those, but if this work is already done by gcc I'd prefer to not worry about register clobbering and optimization myself.

Related

Do I need to write explicit memory barrier for multithreaded C code?

I'm writing some code on Linux using pthread multithreading library and I'm currently wondering if following code is safe when compiled with -Ofast -lto -pthread.
// shared global
long shared_event_count = 0;
// ...
pthread_mutex_lock(mutex);
while (shared_event_count <= *last_seen_event_count)
pthread_cond_wait(cond, mutex);
*last_seen_event_count = shared_event_count;
pthread_mutex_unlock(mutex);
Are the calls to pthread_* functions enough or should I also include memory barrier to make sure that the change to global variable shared_event_count is actually updated during the loop? Without memory barrier the compiler would be freely to optimize the variable as register integer only, right? Of course, I could declare the shared integer as volatile which would prevent keeping the contents of that variable in register only during the loop but if I used the variable multiple times within the loop, it could make sense to only check the fresh status for the loop conditition only because that could allow for more compiler optimizations.
From testing the above code as-is, it appears that the generated code actually sees the changes made by another thread. However, is there any spec or documentation that actually guarantees this?
The common solution seems to be "don't optimize multithreaded code too aggressively" but that seems like a poor man's workaround instead of really fixing the issue. I'd rather write correct code and let the compiler optimize as much as possible within the specs (any code that gets broken by optimizations is in reality using e.g. undefined behavior of C standard as assumed stable behavior, except for some rare cases where compiler actually outputs invalid code but that seems to be very very rare these days).
I'd much prefer writing the code that works with any optimizing compiler – as such, it should only use features specified in the C standard and the pthread library documentation.
I found an interesting article at https://www.alibabacloud.com/blog/597460 which contains a trick like this:
#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
This was actually used first in Linux kernel and it triggered a compiler bug in old GCC versions: https://lwn.net/Articles/624126/
As such, let's assume that the compiler is actually following the spec and doesn't contain a bug but implements every possible optimization known to man allowed by the specs. Is the above code safe with that assumption?
Also, does pthread_mutex_lock() include memory barrier by the spec or could compiler re-order the statements around it?
The compiler will not reorder memory accesses across pthread_mutex_lock() (this is an oversimplification and not strictly true, see below).
First I’ll justify this by talking about how compilers work, then I’ll justify this by looking at the spec, and then I’ll justify this by talking about convention.
I don’t think I can give you a perfect justification from the spec. In general, I would not expect a spec to give you a perfect justification—it’s turtles all the way down (do you have a spec for how to interpret the spec?), and the spec is designed to be read and understood by actual humans who understand the relevant background concepts.
How This Works
How this works—the compiler, by default, assumes that a function it doesn’t know can access any global variable. So it must emit the store to shared_event_count before the call to pthread_mutex_lock()—as far as the compiler knows, pthread_mutex_lock() reads the value of shared_event_count.
Inside pthread_mutex_lock is a memory fence for the CPU, if necessary.
Justification
From n1548:
In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
Yes, there’s LTO. LTO can do some very surprising things. However, the fact is that writing to shared_event_count does have side effects and those side effects do affect the behavior of pthread_mutex_lock() and pthread_mutex_unlock().
The POSIX spec states that pthread_mutex_lock() provides synchronization. I could not find an explanation in the POSIX spec of what synchronization is, so this may have to suffice.
POSIX 4.12
Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads. The following functions synchronize memory with respect to other threads:
Yes, in theory the store to shared_event_count could be moved or eliminated—but the compiler would have to somehow prove that this transformation is legal. There are various ways you could imagine this happening. For example, the compiler might be configured to do “whole program optimization”, and it may observe that shared_event_count is never read by your program—at which point, it’s a dead store and can be eliminated by the compiler.
Convention
This is how pthread_mutex_lock() has been used since the dawn of time. If compilers did this optimization, pretty much everyone’s code would break.
Volatile
I would generally not use this macro in ordinary code:
#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
This is a weird thing to do, useful in weird situations. Ordinary multithreaded code is not a sufficiently weird situation to use tricks like this. Generally, you want to either use locks or atomics to read or write shared values in a multithreaded program. ACCESS_ONCE does not use a lock and it does not use atomics—so, what purpose would you use it for? Why wouldn’t you use atomic_store() or atomic_load()?
In other words, it is unclear what you would be trying to do with volatile. The volatile keyword is easily the most abused keyword in C. It is rarely useful except when writing to memory-mapped IO registers.
Conclusion
The code is fine. Don’t use volatile.

Pointer to a 'constprop' function in C

I'm currently developing a C code for a RISC-V SoC that I'm compiling using GCC. I have a foo(a,b) function, and I need to provide its address to the hardware through a customed CSR (Control and Status Register) using CSRRW. As of now, I was defining a macro, compiling/linking the program, changing the value of the macro according to the generated firmware, compiling/linking again and checking that the address didn't change. Nevertheless, I'd like to have a more generic approach, hence the use of pointers.
I tried to simply associate the function base address to a variable (addr = foo), which is working most of the time. However, after compiling and linking the code with GCC (and -O3 optimization), I've noticed that 2 versions of the function were created:
<foo>
<foo.constprop.0>
The pointer I get in the code is referring to the first one, however, I would also need a variable to point to the second one. Is there a generic way to deal with this?
EDIT:
As Nate Eldredge suggested, I created a function foo_const() calling foo(0,0). What happens is that GCC is creating <foo_const> with a jump instruction only. So a pointer to foo_const would not fix the issue.
I also tried to copy and adapt the content of foo(a,b) in foo_const() by replacing the arguments with constants. This work as the called function is directly the one I created and not the 'constprop' anymore. However, it doesn't really reply to my question as it is a workaround not to point to 'constprop' instead of a way to point to it, and also there is a negative impact on performances.

Compile assembly code from C code without using specific register in gcc

I am injecting some control flow monitoring codes to a program. I get an assembly code generated by GCC C compiler (flag -S). Then I add some monitoring codes in assembly before every indirect branches within the application. Those monitoring codes needs to use some registers and therefore, for every branch I inject the code I have to push and pop the registers I use in order to save the previously written value and return them after.
However since the performance is an issue, I was wondering if I can avoid the push pops when I convert the C code to assembly code and tell the GCC to generate assembly code without using one or two specific register. Therefore, I can avoid using push pops for every indirect branch to save the existing values in the register.
Is there any way to do that?
See the -ffixed-reg option.
Note that if the register in question is required to be used for passing arguments, etc, this won't work (indeed, it appears that in that case, gcc will silently use it anyway).

C dummy operations

I cant imagine what the compiler does when for instance there is no lvalue for instance like this :
number>>1;
My intuition tells me that the compiler will discard this line from compilation due to optimizations and if the optimization is removed what happens?
Does it use a register to do the manipulation? or does it behave like if it was a function call so the parameters are passed to the stack, and than the memory used is marked as freed? OR does it transform that to an NOP operation?
Can I see what is happening using the VS++ debugger?
Thank your for your help.
In the example you give, it discards the operation. It knows the operation has no side effects and therefore doesn't need to emit the code to execute the statement in order to produce a correct program. If you disable optimizations, the compiler may still emit code. If you enable optimizations, the compiler may still emit code, too -- it's not perfect.
You can see the code the compiler emits using the /FAsc command line option of the Microsoft compiler. That option creates a listing file which has the object code output of the compiler interspersed with the related source code.
You can also use "view disassembly" in the debugger to see the code generated by the compiler.
Using either "view disassembly" or /FAsc on optimized code, I'd expect to see no emitted code from the compiler.
Assuming that number is a regular variable of integer type (not volatile) then any competent optimizing compiler (Microsoft, Intel, GNU, IBM, etc) will generate exactly NOTHING. Not a nop, no registers are used, etc.
If optimization is disabled off (in a "debug build"), then the compiler may well "do what you asked for", because it doesn't realize it doesn't have side-effects from the code. In this case, the value will be loaded into a register, shifted right once. The result of this is not stored anywhere. The compiler will perform "useless code elimination" as one of the optimization steps - I'm not sure which one, but for this sort of relatively simple thing, I expect the compiler to figure out with fairly basic optimization settings. Some cases, where loops are concerned, etc, the compiler may not optimize away the code until some more advanced optimization settings are enabled.
As mentioned in the comments, if the variable is volatile, then the read of the memory reprsented by number will have to be made, as the compiler MUST read volatile memory.
In Visual studio, if you "view disassembly", it should show you the code that the compiler generated.
Finally, if this was C++, there is also the possibility that the variable is not a regular integer type, the function operator>> is being called when this code is seen by the compiler - this function may have side-effects besides returning a result, so may well have to be performed. But this can't be the case in C, since there is no operator overloading.

Why does avr-gcc bother to save the register state when calling main()?

The main() function in an avr-gcc program saves the register state on the stack, but when the runtime calls it I understand on a microcontroller there isn't anything to return to. Is this a waste of RAM? How can this state saving be prevented?
How can the compiler be sure that you aren't going to recursively call main()?
It's all about the C-standard.
Nothing forbids you from exiting main at some time. You may not do it in your program, but others may do it.
Furthermore you can register cleanup-handlers via the atexit runtime function. These functions need a defined register state to execute properly, and the only way to guarantee this is to save and restore the registers around main.
It could even be useful to do this:
I don't know about the AVR but other micro-controllers can go into a low power state when they're done with their job and waiting for a reset. Doing this from a cleanup-handler may be a good idea because this handler gets called if you exit main the normal way and (as far as I now) if your program gets interrupted via a kill-signal.
Most likely main is just compiled in the same was as a standard function. In C it pretty much needs to be because you might call it from somewhere.
Note that in C++ it's illegal to call main recursively so a c++ compiler might be able to optimize this more. But in C as your question stated it's legal (if a bad idea) to call main recursively so it needs to be compiled in the same way as any other function.
How can this state saving be prevented?
The only thing you can do is to write you own C-Startup routine. That means messing with assembler, but you can then JUMP to your main() instead of just CALLing it.
In my tests with avr-gcc 4.3.5, it only saves registers if not optimizing much. Normal levels (-Os or -O2) cause the push instructions to be optimized away.
One can further specify in a function declaration that it will not return with __attribute__((noreturn)). It is also useful to do full program optimization with -fwhole-program.
The initial code in avr-libc does use call to jump to main, because it is specified that main may return, and then jumps to exit (which is declared noreturn and thus generates no call). You could link your own variant if you think that is too much. exit() in turn simply disables interrupts and enters an infinite loop, effectively stopping your program, but not saving any power. That's four instructions and two bytes of stack memory overhead if your main() never returns or calls exit().

Resources