Compiler optimizations not compiling constant? - c

I have the following string declared as a constant in my code. The purpose is to provide a crude and simple way of storing simple metadata in the compiled output.
const char myString1[] ="abc123\0";
const char myString2[] = {'a','b','c','1','2','3','\0'};
When I inspect the output with a hex editor, I see other string constants but "abc123" does not appear. This leads me to believe that the optimizations that are enabled are causing the lines not to be compiled, as they are never referenced in the program.
Is there a way in code to force this to compile, or another way (in code) of getting this metadata into the binary? I don't want to do any manipulation of the binary post-compile, the goal is to keep it as simple as possible.
compiler flags
-O2 -g -Wall -c -fmessage-length=0 -fno-builtin -ffunction-sections -mcpu=cortex-m3 -mthumb

I think you are looking for the used attribute:
`used'
This attribute, attached to a variable, means that the variable
must be emitted even if it appears that the variable is not
referenced.
When applied to a static data member of a C++ class template, the
attribute also means that the member will be instantiated if the
class itself is instantiated.
Apply it like
__attribute__((used))
const char myString1[] ="abc123\0";
__attribute__((used))
const char myString2[] = {'a','b','c','1','2','3','\0'};
Given the compiler flags you posted, it is almost certainly the linker. The -ffunction-sections flag puts each definition into its own section in the object files. This allows the linker to easily determine that a data item or function is not referenced and omit it from the final binary.

Use the binutils strings command to see if these strings are present in your binary.
If they have been optimized out, you can try to use the volatile qualifier when you declare them. Note that if they are not used even with the volatile qualifier some compilers can still optimized them out.

I've come up with a solution that uses attributes and involves modifying the link script.
First I define a custom section called ".metadata".
__attribute__ ((section(".metadata")))
Then, in the SECTIONS block of the .ld script I added a KEEP(*(.metadata)) which will force the linker to include .metadata even if it's not used
.text :
{
KEEP(*(.isr_vector))
KEEP(*(.metadata))
*(.text*)
*(.rodata*)
} > MFlash32
NOTE
I found that the __attribute__ keyword had to be on the same line as the variable or else it didn't actually show up in the binary, though the .metadata section did show up in the memory map.

If you have these variables in file scope, the compiler must provide the strings, since he can't know if they will be used from a different compilation unit. So any of your ".o" files where you place these variables, must contain the string.
Now a clever linker could decide for the final binary that these constants are not needed. (I have never observed that, though.) If this is the case for your platform, you should use the variable on a "hypothetical" path, that in reality will never be taken by the program. Something like
int main(int argc, char*argv[]){
switch (argv[0][0]) {
case 1: return myString1[argv[0][1]];
case 2: return myString2[argv[0][1]];
}
...
}

Related

Gcc Force global variable to a given address using linker only

I'm trying to force a global variable to a specific address without modifying the source code.
I'm well aware of solution such as:
// C source code
MyStruct globalVariable __attribute__((section(".myLinkerSection")));
// Linker script
. = 0x400000;
.myLinkerSection:
{
*(.myLinkerSection)
}
But in my case I would like to do the same thing without the __attribute__((section(".myLinkerSection"))) keyword.
Is it doable ?
EDIT:
I cannot modify the source code at all.
The variable is defined as follow:
file.h:
extern MyStruct globalVariable;
file.c:
MyStruct globalVariable;
I assume from the mentions of __attribute__ that you are using gcc / clang or something compatible. You can use the -fdata-sections option to make the compiler put every variable into its own section. With that option, your globalVariable, assuming it would otherwise go in .bss, will be placed in a section called .bss.globalVariable (the exact name might be platform-dependent). Then you can use your linker script to place this section at the desired address.
Note that this option will inhibit certain compiler optimizations. There is a guarantee that objects defined in the same section within the same assembler module are assembled in strict order, and that their addresses do not change after that. In some cases the compiler can take advantage of this; e.g. if it defines int variables foo and bar consecutively in the same section, then it knows their addresses are consecutive, and it can safely generate code that "hardcodes" their relative position. For instance, on some platforms such as ARM64, it takes multiple instructions to materialize the address of a global or static object. So if some function accesses both foo and bar, the compiler can materialize the address of foo, then add the fixed constant 4 to get the address of bar. But if foo and bar are in different sections, this can't be done, and you will pay the (small but nonzero) cost of materializing both addresses separately.
As such, you may want to use -fdata-sections only on the particular source files that define the particular variables of concern.
This also illustrates why you have to get the variable in its own section in order to set its address; you can't move just one variable from a section, since the compiler may have been relying on its relative position to some other variable in that section.
You can define this variable in a separate translation unit. Then list its object file in the appropriate section.

Linker: Enforce symbol ordering in resulting binary

I am building a library which roughly boils down to this:
// foo.c
extern void func();
int main() {
// ...
}
I compile with gcc -o foo func.o foo.c.
This results in a binary where the symbol func is before main (i.e. has lower address).
However if I add optimization, f.e. -O3 the linker decides to place func after main.
Is there a way to enforce this order?
Some linkers seems to store their symbols by a key-value table with the hashed symbol as key. And when it comes to allocation, they might use the sequence in the hashed table, which might not be the sequence in which the symbols were encountered.
I never found a way to control this behavior. It happened to me with global variables.
You might get some control if you use a specific linker script and assign segments/sections to the functions.

How to declare a variable using string concatenation and use that variable to print and integer defined as variable in C? [duplicate]

I have a program test.c
int global_var=10;
printf("Done");
i did
gcc -g test.c -o test
My query is
Is there a way i can get the variable name as argument (say "global_var") and print the value.
Thanks
No, C doesn't have introspection. Once the compiler has generated code, the program can not look up variable names.
The way these things are usually solved is by having a collection of all special variables that needs to be looked up by name, containing both the actual name as a string and the variable it self.
Usually it's an array of structures, something like
struct
{
const char *name;
int value;
} variables[] = {
{ "global_var", 10 }
};
The program can then look through the array variables to search for "global_var" and use (or change) the value in the structure.
General answer: No. There is no connection between a variable name and its string representation (you can get the string representation of a variable name at compile time with the preprocessor, though).
For identifiers with external linkage, there are (platform-dependent) ways: See e.g. dlsym for POSIX systems.
You can compile with debugging information and access (most) variables by names from input. Unless you really write something like a debugger, this would be a horrible design, however (and even then, you don’t access the variables used in the debugger itself but of the programme being debugged).
Finally, you could implement your own lookup table mapping from string representations to values.
No.
We only have variable names so humans don't get confused .
After your program gets turned into assembly and eventually machine code, the computer doesn't care what you name your variables.
Alternatively you could use a structure in which you would store the value and the name as a string:
struct tag_name {
char *member1;
int member2;
};
In general, it is not possible to access at runtime global variables by name. Sometimes, it might depend upon the operating system, and how the compiler is invoked. I still assume you want to dereference a global variable, and you know its type.
Then on Linux and some other systems, you could use dlopen(3) with a NULL path (to get a handle for the executable), then use dlsym on the global variable name to get its address; you can then cast that void* pointer to a pointer of the appropriate type and dereference it. Notice that you need to know the type (or at least have a convention to encode the type of the variable in its name; C++ is doing that with name mangling). If you compiled and linked with debug information (i.e. with gcc -g) the type information is in its DWARF sections of your ELF executable, so there is some way to get it.
This works if you link your executable using -rdynamic and with -ldl
Another possibility might be to customize your recent GCC with your own MELT extension which would remember and later re-use some of the compiler internal representations (i.e. the GCC Tree-s related to global variables). Use MELT register_finish_decl_first function to register a handler on declarations. But this will require some work (in coding your MELT extension).
using preprocessor tricks
You could use (portable) preprocessor tricks to achieve your goals (accessing variable by name at runtime).
The simplest way might be to define and follow your own conventions. For example you could have your own globvar.def header file containing just lines like
/* file globvar.def */
MY_GLOBAL_VARIABLE(globalint,int)
MY_GLOBAL_VARIABLE(globalint2,int)
MY_GLOBAL_VARIABLE(globalstr,char*)
#undef MY_GLOBAL_VARIABLE
And you adopt the convention that all global variables are in the above globvar.def file. Then you would #include "globvar.def" several times. For instance, in your global header, expand MY_GLOBAL_VARIABLE to some extern declaration:
/* in yourheader.h */
#define MY_GLOBAL_VARIABLE(Nam,Typ) extern Typ Nam;
#include "globvar.def"
In your main.c you'll need a similar trick to declare your globals.
Elsewhere you might define a function to get integer variables by name:
/* return the address of global int variable or else NULL */
int* global_int_var_by_name (const char*name) {
#define MY_GLOBAL_VARIABLE(Nam,Typ) \
if (!strcmp(#Typ,"int") && !strcmp(name,#Nam)) return (int*)&Nam;
#include "globvar.def"
return NULL;
}
etc etc... I'm using stringification of macro arguments.
Such preprocessor tricks are purely standard C and would work with any C99 compliant compiler.

How to compile and keep "unused" C declarations with clang -emit-llvm

Context
I'm writing a compiler for a language that requires lots of runtime functions. I'm using LLVM as my backend, so the codegen needs types for all those runtime types (functions, structs, etc) and instead of defining all of them manually using the LLVM APIs or handwriting the LLVM IR I'd like to write the headers in C and compile to the bitcode that the compiler can pull in with LLVMParseBitcodeInContext2.
Issue
The issue I'm having is that clang doesn't seem to keep any of the type declarations that aren't used by any any function definitions. Clang has -femit-all-decls which sounds like it's supposed to solve it, but it unfortunately isn't and Googling suggests it's misnamed as it only affects unused definitions, not declarations.
I then thought perhaps if I compile the headers only into .gch files I could pull them in with LLVMParseBitcodeInContext2 the same way (since the docs say they use "the same" bitcode format", however doing so errors with error: Invalid bitcode signature so something must be different. Perhaps the difference is small enough to workaround?
Any suggestions or relatively easy workarounds that can be automated for a complex runtime? I'd also be interested if someone has a totally alternative suggestion on approaching this general use case, keeping in mind I don't want to statically link in the runtime function bodies for every single object file I generate, just the types. I imagine this is something other compilers have needed as well so I wouldn't be surprised if I'm approaching this wrong.
e.g. given this input:
runtime.h
struct Foo {
int a;
int b;
};
struct Foo * something_with_foo(struct Foo *foo);
I need a bitcode file with this equivalent IR
runtime.ll
; ...etc...
%struct.Foo = type { i32, i32 }
declare %struct.Foo* #something_with_foo(%struct.Foo*)
; ...etc...
I could write it all by hand, but this would be duplicative as I also need to create C headers for other interop and it'd be ideal not to have to keep them in sync manually. The runtime is rather large. I guess I could also do things the other way around: write the declarations in LLVM IR and generate the C headers.
Someone else asked about this years back, but the proposed solutions are rather hacky and fairly impractical for a runtime of this size and type complexity: Clang - Compiling a C header to LLVM IR/bitcode
Clang's precompiled headers implementation does not seem to output LLVM IR, but only the AST (Abstract Syntax Tree) so that the header does not need to be parsed again:
The AST file itself contains a serialized representation of Clang’s
abstract syntax trees and supporting data structures, stored using the
same compressed bitstream as LLVM’s bitcode file format.
The underlying binary format may be the same, but it sounds like the content is different and LLVM's bitcode format is merely a container in this case. This is not very clear from the help page on the website, so I am just speculating. A LLVM/Clang expert could help clarify this point.
Unfortunately, there does not seem to be an elegant way around this. What I suggest in order to minimize the effort required to achieve what you want is to build a minimal C/C++ source file that in some way uses all the declarations that you want to be compiled to LLVM IR. For example, you just need to declare a pointer to a struct to ensure it does not get optimized away, and you may just provide an empty definition for a function to keep its signature.
Once you have a minimal source file, compile it with clang -O0 -c -emit-llvm -o precompiled.ll to get a module with all definitions in LLVM IR format.
An example from the snippet you posted:
struct Foo {
int a;
int b;
};
// Fake function definition.
struct Foo * something_with_foo(struct Foo *foo)
{
return NULL;
}
// A global variable.
struct Foo* x;
Output that shows that definitions are kept: https://godbolt.org/g/2F89BH
So, clang doesn't actually filter out the unused declarations. It defers emitting forward declarations till their first use. Whenever a function is used it checks if it has been emitted already, if not it emits the function declaration.
You can look at these lines in the clang repo.
// Forward declarations are emitted lazily on first use.
if (!FD->doesThisDeclarationHaveABody()) {
if (!FD->doesDeclarationForceExternallyVisibleDefinition())
return;
The simple fix here would be to either comment the last two lines or just add && false to the second condition.
// Forward declarations are emitted lazily on first use.
if (!FD->doesThisDeclarationHaveABody()) {
if (!FD->doesDeclarationForceExternallyVisibleDefinition() && false)
return;
This will cause clang to emit a declaration as soon as it sees it, this might also change the order in which definitions appear in your .ll (or .bc) files. Assuming that is not an issue.
To make it cleaner you can also add a command line flag --emit-all-declarations and check that here before you continue.

C program to get variable name as input and print value

I have a program test.c
int global_var=10;
printf("Done");
i did
gcc -g test.c -o test
My query is
Is there a way i can get the variable name as argument (say "global_var") and print the value.
Thanks
No, C doesn't have introspection. Once the compiler has generated code, the program can not look up variable names.
The way these things are usually solved is by having a collection of all special variables that needs to be looked up by name, containing both the actual name as a string and the variable it self.
Usually it's an array of structures, something like
struct
{
const char *name;
int value;
} variables[] = {
{ "global_var", 10 }
};
The program can then look through the array variables to search for "global_var" and use (or change) the value in the structure.
General answer: No. There is no connection between a variable name and its string representation (you can get the string representation of a variable name at compile time with the preprocessor, though).
For identifiers with external linkage, there are (platform-dependent) ways: See e.g. dlsym for POSIX systems.
You can compile with debugging information and access (most) variables by names from input. Unless you really write something like a debugger, this would be a horrible design, however (and even then, you don’t access the variables used in the debugger itself but of the programme being debugged).
Finally, you could implement your own lookup table mapping from string representations to values.
No.
We only have variable names so humans don't get confused .
After your program gets turned into assembly and eventually machine code, the computer doesn't care what you name your variables.
Alternatively you could use a structure in which you would store the value and the name as a string:
struct tag_name {
char *member1;
int member2;
};
In general, it is not possible to access at runtime global variables by name. Sometimes, it might depend upon the operating system, and how the compiler is invoked. I still assume you want to dereference a global variable, and you know its type.
Then on Linux and some other systems, you could use dlopen(3) with a NULL path (to get a handle for the executable), then use dlsym on the global variable name to get its address; you can then cast that void* pointer to a pointer of the appropriate type and dereference it. Notice that you need to know the type (or at least have a convention to encode the type of the variable in its name; C++ is doing that with name mangling). If you compiled and linked with debug information (i.e. with gcc -g) the type information is in its DWARF sections of your ELF executable, so there is some way to get it.
This works if you link your executable using -rdynamic and with -ldl
Another possibility might be to customize your recent GCC with your own MELT extension which would remember and later re-use some of the compiler internal representations (i.e. the GCC Tree-s related to global variables). Use MELT register_finish_decl_first function to register a handler on declarations. But this will require some work (in coding your MELT extension).
using preprocessor tricks
You could use (portable) preprocessor tricks to achieve your goals (accessing variable by name at runtime).
The simplest way might be to define and follow your own conventions. For example you could have your own globvar.def header file containing just lines like
/* file globvar.def */
MY_GLOBAL_VARIABLE(globalint,int)
MY_GLOBAL_VARIABLE(globalint2,int)
MY_GLOBAL_VARIABLE(globalstr,char*)
#undef MY_GLOBAL_VARIABLE
And you adopt the convention that all global variables are in the above globvar.def file. Then you would #include "globvar.def" several times. For instance, in your global header, expand MY_GLOBAL_VARIABLE to some extern declaration:
/* in yourheader.h */
#define MY_GLOBAL_VARIABLE(Nam,Typ) extern Typ Nam;
#include "globvar.def"
In your main.c you'll need a similar trick to declare your globals.
Elsewhere you might define a function to get integer variables by name:
/* return the address of global int variable or else NULL */
int* global_int_var_by_name (const char*name) {
#define MY_GLOBAL_VARIABLE(Nam,Typ) \
if (!strcmp(#Typ,"int") && !strcmp(name,#Nam)) return (int*)&Nam;
#include "globvar.def"
return NULL;
}
etc etc... I'm using stringification of macro arguments.
Such preprocessor tricks are purely standard C and would work with any C99 compliant compiler.

Resources