It looks like GCC with -O2 and __attribute__((weak)) produces different results depending on how you reference your weak symbols. Consider this:
$ cat weak.c
#include <stdio.h>
extern const int weaksym1;
const int weaksym1 __attribute__(( weak )) = 0;
extern const int weaksym2;
const int weaksym2 __attribute__(( weak )) = 0;
extern int weaksym3;
int weaksym3 __attribute__(( weak )) = 0;
void testweak(void)
{
if ( weaksym1 == 0 )
{
printf( "0\n" );
}
else
{
printf( "1\n" );
}
printf( "%d\n", weaksym2 );
if ( weaksym3 == 0 )
{
printf( "0\n" );
}
else
{
printf( "1\n" );
}
}
$ cat test.c
extern const int weaksym1;
const int weaksym1 = 1;
extern const int weaksym2;
const int weaksym2 = 1;
extern int weaksym3;
int weaksym3 = 1;
extern void testweak(void);
void main(void)
{
testweak();
}
$ make
gcc -c weak.c
gcc -c test.c
gcc -o test test.o weak.o
$ ./test
1
1
1
$ make ADD_FLAGS="-O2"
gcc -O2 -c weak.c
gcc -O2 -c test.c
gcc -O2 -o test test.o weak.o
$ ./test
0
1
1
The question is, why the last "./test" produces "0 1 1", not "1 1 1"?
gcc version 5.4.0 (GCC)
Looks like when doing optimizations, the compiler is having trouble with symbols declared const and having the weak definition within the same compilation unit.
You can create a separate c file and move the const weak definitions there, it will work around the problem:
weak_def.c
const int weaksym1 __attribute__(( weak )) = 0;
const int weaksym2 __attribute__(( weak )) = 0;
Same issue described in this question: GCC weak attribute on constant variables
Summary:
Weak symbols only work correctly if you do not initialize them to a value. The linker takes care of the initialization (and it always initializes them to zero if no normal symbol of the same name exists).
If you try to initialize a weak symbol to any value, even to zero as OP did, the C compiler is free to make weird assumptions about its value. The compiler has no distinction between weak and normal symbols; it is all (dynamic) linker magic.
To fix, remove the initialization (= 0) from any symbol you declare weak:
extern const int weaksym1;
const int weaksym1 __attribute__((__weak__));
extern const int weaksym2;
const int weaksym2 __attribute__((__weak__));
extern int weaksym3;
int weaksym3 __attribute__((__weak__));
Detailed description:
The C language has no concept of a "weak symbol". It is a functionality provided by the ELF file format, and (dynamic) linkers that use the ELF file format.
As the man 1 nm man page describes at the "V" section,
When a weak defined symbol is linked with a normal defined symbol, the
normal defined symbol is used with no error. When a weak undefined symbol
is linked and the symbol is not defined, the value of the weak symbol
becomes zero with no error.
the weak symbol declaration should not be initialized to any value, because it will have value zero if the process is not linked with a normal symbol of the same name. ("defined" in the man 1 nm page refers to a symbol existing in the ELF symbol table.)
The "weak symbol" feature was designed to work with existing C compilers. Remember, the C compilers do not have any distinction between "weak" and "normal" symbols.
To ensure this would work without running afoul of the C compiler behaviour, the "weak" symbol must be uninitialized, so that the C compiler cannot make any assumptions as to its value. Instead, it will generate code that obtains the address of the symbol as usual -- and that's where the normal/weak symbol lookup magic happens.
This also means that weak symbols can only be "auto-initialized" to zero, and not any other value, unless "overridden" by a normal, initialized symbol of the same name.
Related
I have an ARM project, where I would like to keep certain unused variables and their data, until the time they are used.
I have seen prevent gcc from removing an unused variable :
__attribute__((used)) did not work for me with a global variable (the documentation does imply it only works on functions) (arm-none-eabi gcc 7), but putting the symbol in a different section via __attribute__((section(".data"))) did work. This is presumably because the linker's is only able to strip symbols when they are given their own section via -fdata-sections. I do not like it, but it worked.
So, I tried this approach, but the variables were not kept - and I think this is because something in that project enables -Wl,--gc-sections during linking. Here is a minimal example showing what I've tried to do (basically the main file only refers to the header where the variables to be "kept" are declared as extern - and other than that, main program has does not use these variables; and then those same variables are defined in a separate .c file):
test.c
#include <stdio.h>
#include "test_opt.h"
const char greeting[] = "Hello World - am used";
int main(void) {
printf("%s!\n", greeting);
return 0;
}
test_opt.h
#include <stdint.h>
extern const char mystring[];
struct MyStruct {
uint16_t param_one;
uint8_t param_two;
unsigned char param_three[32];
};
typedef struct MyStruct MyStruct_t;
extern const MyStruct_t mystruct;
mystruct.c
#include "test_opt.h"
const char __attribute__((section(".MYSTRING"))) mystring[] = "Me, mystring, I am not being used";
const MyStruct_t __attribute__((section(".MYSTRUCT"))) mystruct = {
.param_one = 65535,
.param_two = 42,
.param_three = "myStructer here",
};
Test with usual MINGW64 gcc
Let's first try without -Wl,--gc-sections:
$ gcc -Wall -g mystruct.c test_opt.c -o test_opt.exe
$ strings ./test_opt.exe | grep -i 'mystring\|mystruct'
Me, mystring, I am not being used
*myStructer here
mystring
MyStruct
MyStruct_t
mystruct
mystruct.c
mystruct.c
mystruct.c
mystruct.c
mystring
mystruct
.MYSTRING
.MYSTRUCT
.MYSTRING
.MYSTRUCT
Clearly, variables and content are visible here.
Now let's try -Wl,--gc-sections:
$ gcc -Wall -g -Wl,--gc-sections mystruct.c test_opt.c -o test_opt.exe
$ strings ./test_opt.exe | grep -i 'mystring\|mystruct'
mystring
MyStruct
MyStruct_t
mystruct
mystruct.c
mystruct.c
mystruct.c
mystruct.c
mystring
mystruct
Apparently, here we still have some symbol debugging info left - but there are no sections, nor data being reported.
Test with ARM gcc
Let's re-do same experiment with ARM gcc - first without -Wl,--gc-sections:
$ arm-none-eabi-gcc -Wall -g test_opt.c mystruct.c -o test_opt.elf -lc -lnosys
$ arm-none-eabi-strings ./test_opt.elf | grep -i 'mystring\|mystruct'
Me, mystring, I am not being used
*myStructer here
mystruct.c
MyStruct_t
MyStruct
mystruct
mystruct.c
mystring
mystruct.c
mystring
mystruct
.MYSTRING
.MYSTRUCT
Same as before, variables, content and section names are visible.
Now let's try with -Wl,--gc-sections:
$ arm-none-eabi-gcc -Wall -g -Wl,--gc-sections test_opt.c mystruct.c -o test_opt.elf -lc -lnosys
$ arm-none-eabi-strings ./test_opt.elf | grep -i 'mystring\|mystruct'
Note that, unlike the previous case, here there is neither any data content left, nor any debugging info/symbol names!
So, my question is: assuming that -Wl,--gc-sections is enabled in the project, and I otherwise do not want to remove it (because I like the functionality otherwise), can I somehow specify in code for some special variables, "keep these variables even if the are unused/unreferenced", in such a way that they are kept even with -Wl,--gc-sections enabled?
Note that adding keep to attributes, say:
const char __attribute__((keep,section(".MYSTRING"))) mystring[] = "Me, mystring, I am not being used";
... and compiling with (or without) -Wl,--gc-sections typically results with compiler warning:
mystruct.c:3:1: warning: 'keep' attribute directive ignored [-Wattributes]
3 | const char __attribute__((keep,section(".MYSTRING"))) mystring[] = "Me, mystring, I am not being used";
| ^~~~~
... I guess, because the variables are already declared const if I read that arrow correctly (or maybe because a section is already assumed to be "kept")? So attribute keep is definitely not the answer here ...
To inform linker that some variable needs to be preserved you should use the -Wl,--undefined=XXX option:
gcc ... -Wl,--undefined=greeting
Note that __attribute__((used)) is a compiler-only flag to suppress -Wunused-variable warning.
OK - I found something; not ideal, but at least its just a "syntax hack", and I don't have to come up with stupid stuff to do with the structs just so they show up in the executable (and usually even the code I come up with in that case, gets optimized away :)).
I first tried the (void) varname; hack used for How can I suppress "unused parameter" warnings in C? - I left it below just to show it doesn't work.
What ended up working is: basically, just have a static const void* where the main() is, and assign a pointer to the struct to it (EDIT: in the main()!); I guess because of "static const", the compiler will not remove the variable and its section, even with -Wl,--gc-sections. So test_opt.c now becomes:
#include <stdio.h>
#include "test_opt.h"
const char greeting[] = "Hello World - am used";
static const void *fake; //, *fakeB;
int main(void) {
fake = &mystruct;
(void) &mystring; //fakeB = &mystring;
printf("%s!\n", greeting);
return 0;
}
... and we can test with:
$ arm-none-eabi-gcc -Wall -g -Wl,--gc-sections test_opt.c mystruct.c -o test_opt.elf -lc -lnosys
$ arm-none-eabi-readelf -a ./test_opt.elf | grep -i 'mystring\|mystruct'
[ 5] .MYSTRUCT PROGBITS 00013780 013780 000024 00 A 0 0 4
01 .init .text .fini .rodata .MYSTRUCT .ARM.exidx .eh_frame
5: 00013780 0 SECTION LOCAL DEFAULT 5 .MYSTRUCT
379: 00000000 0 FILE LOCAL DEFAULT ABS mystruct.c
535: 00013780 36 OBJECT GLOBAL DEFAULT 5 mystruct
$ arm-none-eabi-strings ./test_opt.elf | grep -i 'mystring\|mystruct'
*myStructer here
mystruct.c
MyStruct_t
mystruct
MyStruct
mystruct.c
mystring
mystruct.c
mystruct
.MYSTRUCT
Note that only mystruct in above example ended up being preserved - mystring still got optimized away.
EDIT: note that if you try to cheat and move the assignment outside of main:
static const void *fake = &mystruct, *fakeB = &mystring;
int main(void) {
...
... then the compiler will see through your shenanigans, and greet you with:
test_opt.c:6:39: warning: 'fakeB' defined but not used [-Wunused-variable]
6 | static const void *fake = &mystruct, *fakeB = &mystring;
| ^~~~~
test_opt.c:6:20: warning: 'fake' defined but not used [-Wunused-variable]
6 | static const void *fake = &mystruct, *fakeB = &mystring;
| ^~~~
... and you're none the better off still.
This question already has answers here:
What happens if I define the same variable in each of two .c files without using "extern"?
(3 answers)
Closed 2 years ago.
From what I saw across many many stackoverflow questions among other places, the way to define globals is to define them in exactly one .c file, then declare it as an extern in a header file which then gets included in the required .c files.
However, today I saw in a codebase global variable definition in the header file and I got into arguing, but he insisted it will work. Now, I had no idea why, so I created a small project to test it out real quick:
a.c
#include <stdio.h>
#include "a.h"
int main()
{
p1.x = 5;
p1.x = 4;
com = 6;
change();
printf("p1 = %d, %d\ncom = %d\n", p1.x, p1.y, com);
return 0;
}
b.c
#include "a.h"
void change(void)
{
p1.x = 7;
p1.y = 9;
com = 1;
}
a.h
typedef struct coord{
int x;
int y;
} coord;
coord p1;
int com;
void change(void);
Makefile
all:
gcc -c a.c -o a.o
gcc -c b.c -o b.o
gcc a.o b.o -o run.out
clean:
rm a.o b.o run.out
Output
p1 = 7, 9
com = 1
How is this working? Is this an artifact of the way I've set up the test? Is it that newer gcc has managed to catch this condition? Or is my interpretation of the whole thing completely wrong? Please help...
This relies on so called "common symbols" which are an extension to standard C's notion of tentative definitions (https://port70.net/~nsz/c/c11/n1570.html#6.9.2p2), except most UNIX linkers make it work across translation units too (and many even with shared dynamic libaries)
AFAIK, the feature has existed since pretty much forever and it had something to do with fortran compatibility/similarity.
It works by the compiler placing giving uninitialized (tentative) globals a special "common" category (shown in the nm utility as "C", which stands for "common").
Example of data symbol categories:
#!/bin/sh -eu
(
cat <<EOF
int common_symbol; //C
int zero_init_symbol = 0; //B
int data_init_symbol = 4; //D
const int const_symbol = 4; //R
EOF
) | gcc -xc - -c -o data_symbol_types.o
nm data_symbol_types.o
Output:
0000000000000004 C common_symbol
0000000000000000 R const_symbol
0000000000000000 D data_init_symbol
0000000000000000 B zero_init_symbol
Whenever a linker sees multiple redefinitions for a particular symbol, it usually generates linkers errors.
But when those redefinitions are in the common category, the linker will merge them into one.
Also, if there are N-1 common definitions for a particular symbol and one non-tentative definition (in the R,D, or B category), then all the definitions are merged into the one nontentative definition and also no error is generated.
In other cases you get symbol redefinition errors.
Although common symbols are widely supported, they aren't technically standard C and relying on them is theoretically undefined behavior (even though in practice it often works).
clang and tinycc, as far as I've noticed, do not generate common symbols (there you should get a redefinition error). On gcc, common symbol generation can be disabled with -fno-common.
(Ian Lance Taylor's serios on linkers has more info on common symbols and it also mentions how linkers even allow merging differently sized common symbols, using the largest size for the final object: https://www.airs.com/blog/archives/42 . I believe this weird trick was once used by libc's to some effect)
That program should not compile (well it should compile, but you'll have double definition errors in your linking phase) due to how the variables are defined in your header file.
A header file informs the compiler about external environment it normally cannog guess by itself, as external variables defined in other modules.
As your question deals with this, I'll try to explain the correct way to define a global variable in one module, and how to inform the compiler about it in other modules.
Let's say you have a module A.c with some variable defined in it:
A.c
int I_am_a_global_variable; /* you can even initialize it */
well, normally to make the compiler know when compiling other modules that you have that variable defined elsewhere, you need to say something like (the trick is in the extern keyword used to say that it is not defined here):
B.c
extern int I_am_a_global_variable; /* you cannot initialize it, as it is defined elsewhere */
As this is a property of the module A.c, we can write a A.h file, stating that somewhere else in the program, there's a variable named I_am_a_global_variable of type int, in order to be able to access it.
A.h
extern int I_am_a_global_variable; /* as above, you cannot initialize the variable here */
and, instead of declaring it in B.c, we can include the file A.h in B.c to ensure that the variable is declared as the author of B.c wanted to.
So now B.c is:
B.c
#include "A.h"
void some_function() {
/* ... */
I_am_a_global_variable = /* some complicated expression */;
}
this ensures that if the author of B.c decides to change the type or the declaration of the variable, he can do changing the file A.h and all the files that #include it should be recompiled (you can do this in the Makefile for example)
A.c
#include "A.h" /* extern int I_am_a_global_variable; */
int I_am_a_global_variable = 27;
In order to prevent errors, it is good that A.c also #includes the file A.h, so the declaration
extern int I_am_a_global_variable; /* as above, you cannot initialize the variable here */
and the final definition (that is included in A.c):
int I_am_a_global_variable = 23; /* I have initialized it to a non-default value to show how to do it */
are consistent between them (consider the author changes the type of I_am_a_global_variable to double and forgets to change the declaration in A.h, the compiler will complaint about non-matching declaration and definition, when compiling A.c (which now includes A.h).
Why I say that you will have double definition errors when linking?
Well, if you compile several modules with the statement (result of #includeing the file A.h in several modules) with the statement:
#include "A.h" /* this has an extern int I_am_a_global_variable; that informs the
* compiler that the variable is defined elsewhere, but see below */
int I_am_a_global_variable; /* here is _elsewhere_ :) */
then all those modules will have a global variable I_m_a_global_variable, initialized to 0, because the compiler defined it in every module (you don't say that the variable is defined elsewhere, you are stating it to declare and define it in this compilation unit) and when you link all the modules together you'll end with several definitions of a variable with the same name at several places, and the references from other modules using this variable will don't know which one is to be used.
The compiler doesn't know anything of other compilations for an application when it is compiling module A, so you need some means to tell it what is happening around. The same as you use function prototypes to indicate it that there's a function somewhere that takes some number of arguments of types A, B, C, etc. and returns a value of type Z, you need to tell it that there's a variable defined elsewhere that has type X, so all the accesses you do to it in this module will be compiled correctly.
I'm new to strong & weak symbol concepts. For the following example (pure C) code, x is defined twice, one strong and one weak. I'd like to make my compiler report error:
foo.c:
#include <stdio.h>
void f(void);
int x = 15213;
int main(){
f();
printf("x = %d\n", x);
return 0;
}
bar.c
int x;
void f(){
x = 15212;
}
For gcc, "-fno-common" is what I want:
gcc -o foobar foo.c bar.c -fno-common
Thus it reports redefined symbols (x).
Is there an equivalent compile option in Visual Studio? (Correct me if I'm wrong described)
This is a link option: not a compile option. The compiler doesn't know anything about x being declared anywhere else except in the file it is compiling. The linker, however, sees two x declarations and will generate error 2005 or 1169.
So far I've assumed that objects with static linkage (i.e. static functions and static variables) in C do not collide with other objects (of static or external linkage) in other compilation units (i.e. .c files) so I've used "short" names for internal helper functions rather than prefixing everything with the library name. Recently a user of my library experienced a crash due to a name collision with an exported function from another shared library. On investigation it turned out that several of my static functions are part of the symbol table of the shared library. Since it happens with several GCC major versions I assume I'm missing something (such a major bug would be noticed and fixed).
I managed to get it down to the following minimum example:
#include <stdbool.h>
#include <stdlib.h>
bool ext_func_a(void *param_a, char const *param_b, void *param_c);
bool ext_func_b(void *param_a);
static bool bool_a, bool_b;
static void parse_bool_var(char *doc, char const *var_name, bool *var)
{
char *var_obj = NULL;
if (!ext_func_a(doc, var_name, &var_obj)) {
return;
}
*var = ext_func_b(var_obj);
}
static void parse_config(void)
{
char *root_obj = getenv("FOO");
parse_bool_var(root_obj, "bool_a", &bool_a);
parse_bool_var(root_obj, "bool_b", &bool_b);
}
void libexample_init(void)
{
parse_config();
}
Both the static variable bool_a and the static function parse_bool_var are visible in the symbol table of the object file and the shared library:
$ gcc -Wall -Wextra -std=c11 -O2 -fPIC -c -o example.o example.c
$ objdump -t example.o|egrep 'parse_bool|bool_a'
0000000000000000 l O .bss 0000000000000001 bool_a
0000000000000000 l F .text 0000000000000050 parse_bool_var
$ gcc -shared -Wl,-soname,libexample.so.1 -o libexample.so.1.1 x.o -fPIC
$ nm libexample.so.1.1 |egrep 'parse_bool|bool_a'
0000000000200b79 b bool_a
0000000000000770 t parse_bool_var
I've dived into C11, Ulrich Drepper's "How to Write Shared Libraries" and a couple of other sources explaining visibility of symbols, but I'm still at a loss. Why are bool_a and parse_bool_var ending up in the dynamic symbol table even though they're declared static?
The lower case letter in the second column of nm output means they're local (if they were upper case, it would be a different story).
Those symbols won't conflict with other symbols of the same name and AFAIK, are basically there only for debugging purposes.
Local symbols won't go into the dynamic symbol table (printable with nm -D but only in shared libraries) either and they are strippable along with exported symbols (upper case letters in the second column of nm output) that aren't dynamic.
(As you will have learned from Drepper's How to Write Shared Libraries, you can control visibility with -fvisibility=(default|hidden) (shouldn't use protected) and with visibility attributes.)
I am trying to do this (is this possible?) with GCC compiler:
Specifiy a function but this function if is not implemented point to a NULL. Example:
extern void something(uint some);
And if this is unimplemented point to a NULL value.
So it's possible check like this:
something != NULL ? something(222) : etc.;
I would like solution with trough GCC (this could be solvable with function pointers).
This is definitely not portable, but gcc can do this with weak symbols on some platforms. I know this works on Linux and *BSD, but doesn't work on MacOS.
$ cat weak.c
#include <stdio.h>
extern int foo(void) __attribute__((__weak__));
int
main(int argc, char **argv)
{
int x = foo ? foo() : 42;
printf("%d\n", x);
return 0;
}
$ cat weak2.c
int
foo(void)
{
return 17;
}
$ cc -o weak weak.c && ./weak
42
$ cc -o weak weak.c weak2.c && ./weak
17
$
You can do this using GCC's weakref attribute:
extern void something(int);
static void something_else(int) __attribute__((weakref("something")));
int main()
{
if (something_else)
something_else(122);
}
If something is not defined in the program then the weak alias something_else will have an address of zero. If something is defined, something_else will be an alias for it.
Essentially you are trying to get the compiler to locate a function at the memory address 0 (NULL). This cannot be done in C without platform/compiler specific constructs.
One question though, is why you would ever want to do this. C is a static language, so if you know that the function will never exist during compilation you might as well just use the pre-processor to tell the rest of the program about this at compile time. Indeed these sorts of compile time substitutions are precisely why the preprocessor is there in the first place.
I would create a macro that you define if your function exists as follows:
#define THE_SOMETHING_FUNCTION_EXISTS
Then replace anywhere you would have tested for something == NULL with an #ifdef instead.
Of course, if the function’s existence might change at run-time then the correct way to implement the behaviour you want is to make something a function pointer.