MSVC++ optimizing away volatile static - visual-c++-2012

I have a Visual Studio solution where certain "plugin" classes register with a factory class, so that the "plugins" can be created by name. These plugins, with the factory, reside in a static library project.
The registering takes place through a volatile static registering template class, each in its own plugin compilation unit (thus the registering process is "done" by each plugin, and there is no central information of the available plugins) like:
volatile static StaticPluginRegisterHelper<PluginClass> s_register;
but the problem is that if the PluginClass is not used verbatim elsewhere in the code, the linker opts to optimize the code away, i.e. the static above NEVER gets executed.
This seems to me like a compiler or linker bug, as I have told specifically that the static is volatile (i.e. "DON'T TOUCH!") :)
The workaround is obviously to move the registering above to a compilation unit that I KNOW is included always, and that of course works. It is however, not as neat.
Ideas?

The linker doesn't care about volatile. If it concludes that a symbol is unreferenced, it will be a candidate for eviction. To force a reference to an otherwise unreferenced symbol you can add the /INCLUDE linker option. This can be embedded in your source code as well using #pragma comment(linker,"/include:_s_register"). – IInspectable Feb 23 at 13:11

Related

Change access/scope of C variable from static to global at compile time, without changing source code?

Let's say, in my C executable project, I use a C library to which I have source access - in fact, the .c files of the library get built as part of the build process for my C executable.
Then, let's say in some of the library .c files, let's say exotic.c, there is something like:
static struct exotic_type all_exotic_array[NUM_EXOTIC_ITEMS][8];
... which the author of the library believes should remain hidden (ergo static) - but which, it turns out, I need a reference to in my code; therefore I'd like to make this variable's symbol publicly accessible.
So, in principle I could just erase the static keyword there and recompile - and I'd get the all_exotic_array as global. However, let's say I also do not want to change the source code of the library.
So in this context, I can intervene in the Makefile, when the individual library .c files get compiled. So:
Is there a compilation switch, with which I could tell the compiler gcc (or linker ld?), while compiling exotic.c, something like: "please ignore the static storage class specifier of the symbol all_exotic_array, and make it available globally"?
If not - is there any other action I could take (maybe after compilation of exotic.c but before linking my executable), to make the symbol all_exotic_array globally available?
I have seen
Access of static variable from one file to another file
Renaming symbols at compile time without changing the code in a cross platform way
Access a global static variable from a .so file without modifying library
Keep all exported symbols when creating a shared library from a static library
... which hint that there may be a possibility to do what I want, however I couldn't find exactly what to do in my use case.

Make unresolved linking dependencies reported at runtime instead of at compilation/program load time for the purposes of unit testing

I have a home-grown unit testing framework for C programs on Linux using GCC. For each file in the project, let's say foobar.c, a matching file foobar-test.c may exist. If that is the case, both files are compiled and statically linked together into a small executable foobar-test which is then run. foobar-test.c is expected to contain main() which calls all the unit test cases defined in foobar-test.c.
Let's say I want to add a new test file barbaz-test.c to exercise sort() inside an existing production file barbaz.c:
// barbaz.c
#include "barbaz.h"
#include "log.h" // declares log() as a linking dependency coming from elsewhere
int func1() { ... res = log(); ...}
int func2() {... res = log(); ...}
int sort() {...}
Besides sort() there are several other functions in the same file which call into log() defined elsewhere in the project.
The functionality of sort() does not depend on log(), so testing it will never reach log(). Neither func1() nor func2() require testing and won't be reachable from the new test case I am about to prepare.
However, the barbaz-test executable cannot be successfully linked until I provide stub implementations of all dependencies coming from barbaz.c. A usual stub looks like this:
// in barbaz-test.c
#include "barbaz.h"
#include "log.h"
int log() {
assert(false && "stub must not be reached");
return 0;
}
// Actual test case for sort() starts here
...
If barbaz.c is large (which is often the case for legacy code written with no regard to the possibility to test it), it will contain many linking dependencies. I cannot start writing a test case for sort() until I provide stubs for all of them. Additionally, it creates a burden of maintaining these stubs, i.e. updating their prototypes whenever the production counterpart is updated, not forgetting to delete stubs which no longer are required etc.
What I am looking for is an option to have late runtime binding performed for missing symbols, similarly to how it is done in dynamic languages, but for C. If an unresolved symbol is reached during the test execution, that should lead to a failure. Having a proper diagnostic about the reason would be ideal, but a simple NULL pointer dereference would be good enough.
My current solution is to automate the initial generation of source code of stubs. It is done by analyzing of linking error messages and then looking up declarations for missing symbols in the headers. It is done in an ad-hoc manner, e.g. it involves "parsing" of C code with regular expressions.
Needless to say, it is very fragile: depends on specific format of linker error messages and uniformly formatted function declarations for regexps to recognize. It does not solve the future maintenance burden such stubs create either.
Another approach is to collect stubs for the most "popular" linking dependencies into a common object file which is then always linked into the test executables. This leaves a shorter list of "unique" dependencies requiring attention for each new file. This approach breaks down when a slightly specialized version of a common stub function has to be prepared. In such cases linking would fail with "the same symbol defined twice".
I may have stumbled on a solution myself, inspired by this discussion: Why can't ld ignore an unused unresolved symbol?
The linker can for sure determine if certain linking dependencies are not reachable. But it is not allowed to remove them by default because the compiler has put all function symbols into the same ELF section. The linker is not allowed to modify sections, but is allowed to drop whole sections.
A solution would be to add -fdata-sections and -ffunction-sections to compiler flags, and --gc-sections to linker flags.
The former options will create one section per function during the compilation. The latter will allow linker to remove unreachable code.
I do not think these flags can be safely used in a project without doing some benchmarking of the effects first. They affect size/speed of the production code.
man gcc says:
Only use these options when there are significant benefits from doing so. When you specify these options, the assembler and linker create larger object and executable files and are also slower. These options affect code generation. They prevent optimizations by the compiler and assembler using relative locations inside a translation unit since the locations are unknown until link time.
And it goes without saying that the solution only applies to the GCC/GNU Binutils toolchain.

GCC 8 for ARM in LTO mode is removing interrupt handlers and weak functions - how to prevent it?

My target device is an EFM32 Cortex-M3 based device. My toolchain is the official ARM GNU toolchain gcc-arm-none-eabi-8-2018-q4-major.
Everything works fine without LTO, but to make LTO work I have to mark all interrupt handler code with -fno-lto. I would like to get rid of this workaround.
The problem is, every interrupt handler is getting removed from the final binary. (I am checking with arm-none-eabi-nm --print-size --size-sort --radix=d -C -n file.out) This makes the resulting binary crash.
Digging deeper and after googling for similar problems:
I tried marking such functions as __attribute__((used)), __attribute((interrupt)) to no avail - the interrupt handlers are getting removed in spite of these attributes. (related Prevent GCC LTO from deleting function)
Found possibly related discussion https://bugs.launchpad.net/gcc-arm-embedded/+bug/1747966 - no solutions posted
Sample code from startup_efm32gg.c defines default interrupt handlers as such:
void DMA_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
/* many other interrupts */
void Default_Handler(void) { while (1); }
The same problem happens for regular interrupt handler definitions as well (as in, no aliases and not weak)
It might be related but it seems that weak symbols are misbehaving in LTO mode in the same way.
Thank you in advance for any ideas!
Edit: See my reply to the marked answer for a full solution!
Where are your interrupt handlers referenced from? Just like unreferenced static functions and objects will be removed from a single translation unit, external ones that are unused will be removed during LTO. In order to prevent this (and in order for your program to be valid anyway in the abstract model) there needs to be some chain of references, starting from the entry point, leading to the functions and objects; if none exists, then you're not actually using them in your program.
If the reference is from a linker script or asm source file, it's possible that this is a bug in LTO, and it's not seeing the references like it should. In this case you might be able to apply a hack like __attribute__((__used__)) to the affected function definitions. Alternatively you could make fake references to them, e.g. by storing their addresses to dummy volatile objects or using their addresses in input constraints to empty inline asm blocks. As yet another alternative, there may be a way to redo whatever you're doing with asm source files or linked scripts to make your interrupt table at the C level, with appropriate structs/arrays in special sections, so that the compiler can actually see the references without you having to fake them.

How to catch unintentional function interpositioning?

Reading through my book Expert C Programming, I came across the chapter on function interpositioning and how it can lead to some serious hard to find bugs if done unintentionally.
The example given in the book is the following:
my_source.c
mktemp() { ... }
main() {
mktemp();
getwd();
}
libc
mktemp(){ ... }
getwd(){ ...; mktemp(); ... }
According to the book, what happens in main() is that mktemp() (a standard C library function) is interposed by the implementation in my_source.c. Although having main() call my implementation of mktemp() is intended behavior, having getwd() (another C library function) also call my implementation of mktemp() is not.
Apparently, this example was a real life bug that existed in SunOS 4.0.3's version of lpr. The book goes on to explain the fix was to add the keyword static to the definition of mktemp() in my_source.c; although changing the name altogether should have fixed this problem as well.
This chapter leaves me with some unresolved questions that I hope you guys could answer:
Does GCC have a way to warn about function interposition? We certainly don't ever intend on this happening and I'd like to know about it if it does.
Should our software group adopt the practice of putting the keyword static in front of all functions that we don't want to be exposed?
Can interposition happen with functions introduced by static libraries?
Thanks for the help.
EDIT
I should note that my question is not just aimed at interposing over standard C library functions, but also functions contained in other libraries, perhaps 3rd party, perhaps ones created in-house. Essentially, I want to catch any instance of interpositioning regardless of where the interposed function resides.
This is really a linker issue.
When you compile a bunch of C source files the compiler will create an object file for each one. Each .o file will contain a list of the public functions in this module, plus a list of functions that are called by code in the module, but are not actually defined there i.e. functions that this module is expecting some library to provide.
When you link a bunch of .o files together to make an executable the linker must resolve all of these missing references. This is the point where interposing can happen. If there are unresolved references to a function called "mktemp" and several libraries provide a public function with that name, which version should it use? There's no easy answer to this and yes odd things can happen if the wrong one is chosen
So yes, it's a good idea in C to "static" everything unless you really do need to use it from other source files. In fact in many other languages this is the default behavior and you have to mark things "public" if you want them accessible from outside.
It sounds like what you want is for the tools to detect that there are name conflicts in functions - ie., you don't want your externally accessible function names form accidentally having the same name and therefore 'override' or hide functions with the same name in a library.
There was a recent SO question related to this problem: Linking Libraries with Duplicate Class Names using GCC
Using the --whole-archive option on all the libraries you link against may help (but as I mentioned in the answer over there, I really don't know how well this works or how easy it is to convince builds to apply the option to all libraries)
Purely formally, the interpositioning you describe is a straightforward violation of C language definition rules (ODR rule, in C++ parlance). Any decent compiler must either detect these situations, or provide options for detecting them. It is simply illegal to define more than one function with the same name in C language, regardless of where these functions are defined (Standard library, other user library etc.)
I understand that many platforms provide means to customize the [standard] library behavior by defining some standard functions as weak symbols. While this is indeed a useful feature, I believe the compilers must still provide the user with means to enforce the standard diagnostics (on per-function or per-library basis preferably).
So, again, you should not worry about interpositioning if you have no weak symbols in your libraries. If you do (or if you suspect that you do), you have to consult your compiler documentation to find out if it offers you with means to inspect the weak symbol resolution.
In GCC, for example, you can disable the weak symbol functionality by using -fno-weak, but this basically kills everything related to weak symbols, which is not always desirable.
If the function does not need to be accessed outside of the C file it lives in then yes, I would recommend making the function static.
One thing you can do to help catch this is to use an editor that has configurable syntax highlighting. I personally use SciTE, and I have configured it to display all standard library function names in red. That way, it's easy to spot if I am re-using a name I shouldn't be using (nothing is enforced by the compiler, though).
It's relatively easy to write a script that runs nm -o on all your .o files and your libraries and checks to see if an external name is defined both in your program and in a library. Just one of the many sane sensible services that the Unix linker doesn't provide because it's stuck in 1974, looking at one file at a time. (Try putting libraries in the wrong order and see if you get a useful error message!)
The Interposistioning occurs when the linker is trying to link separate modules.
It cannot occur within a module. If there are duplicate symbols in a module the linker will report this as an error.
For *nix linkers, unintended Interposistioning is a problem and it is difficult for the linker to guard against it.
For the purposes of this answer consider the two linking stages:
The linker links translation units into modulles (basically
applications or libraries).
The linker links any remaining unfound symbols by searching in modules.
Consider the scenario described in 'Expert C programming' and in SiegeX's question.
The linker fist tries to build the application module.
It sess that the symbol mktemp() is an external and tries to find a funcion definiton for the symbol. The linker finds
the definition for the function in the object code of the application module and marks the symbol as found.
At this stage the symbol mktemp() is completely resolved. It is not considered in any way tentative so as to allow
for the possibility that the anothere module might define the symbol.
In many ways this makes sense, since the linker should first try and resolve external symbols within the module it is
currently linking. It is only unfound symbols that it searches for when linking in other modules.
Furthermore, since the symbol has been marked as resolved, the linker will use the applications mktemp() in any
other cases where is needs to resolve this symbol.
Thus the applications version of mktemp() will be used by the library.
A simple way to guard agains the problem is to try and make all external sysmbols in your application or library unique.
For modules that are only going to shared on a limited basis, this can fairly easily be done by making sure all
extenal symbols in your module are unique by appending a unique identifier.
For modules that are widely shared making up unique names is a problem.

Linking two object files with a common symbol

If i have two object files both defining a symbol (function) "foobar".
Is it possible to tell the linker to obey the obj file order i give in the command line call and always take the symbol from first file and never the later one?
AFAIK the "weak" pragma works only on shared libraries but not on object files.
Please answer for all the C/C++ compiler/linker/operating system combinations you know cause i'm flexibel and use a lot of compiles (sun studio, intel, msvc, gcc, acc).
I believe that you will need to create a static library from the second object file, and then link the first object file and then the library. If a symbol is resolved by an object file, the linker will not search the libraries for it.
Alternatively place both object files in separate static libraries, and then the link order will be determined by their occurrence in the command line.
Creating a static library from an object file will vary depending on the tool chain. In GCC use the ar utility, and for MSVC lib.exe (or use the static library project wizard).
There is a danger here, the keyword here is called Interpositioning dependant code.
Let me give you an example here:
Supposing you have written a custom routine called malloc. And you link in the standard libraries, what will happen is this, functions that require the usage of malloc (the standard function) will use your custom version and the end result is the code may become unstable as the unintended side effect and something will appear 'broken'.
This is just something to bear in mind.
As in your case, you could 'overwrite' (I use quotes to emphasize) the other function but then how would you know which foobar is being used? This could lead to debugging grief when trying to figure out which foobar is called.
Hope this helps,
Best regards,
Tom.
You can make it as a .a file... Then the compiler gets the symbol and doesn't crib later

Resources