Is it possible to remove dead code from a static library?

Is it possible to remove dead code from a static library? - c

I would like to remove dead code from a static library by specifying an entry point.
For instance:
lib1.c
int foo() { return 0; }
int bar() { return 0; }
lib2.c
#include "lib1.h"
int entry() {
return foo();
}
new.a (lib1.a + lib2.a)
libtool -static -o new.a lib1.a lib2.a
I would like new.a to not contain int bar() because it is unused in the entry point of lib1.a, and I don't plan on using lib2.a directly.
Is this possible?

If you compile with -ffunction-sections (and possibly -fdata-sections) and link with -Wl,--gc-sections, the unreferenced functions will be removed. This is subtly different from them not being present to begin with (for example, if bar contained references to other functions or data, it could cause the files containing them to be pulled in for consideration, possibly resulting in new global ctors or overriding weak definitions) but close enough for most purposes.
The right way, on the other hand, is not to define functions that can be used independently in the same translation unit (source file). Split them into separate files and this just works automatically with no special options.

Related

Why does global variable definition in C header file work? [duplicate]

This question already has answers here:
What happens if I define the same variable in each of two .c files without using "extern"?
(3 answers)
Closed 2 years ago.
From what I saw across many many stackoverflow questions among other places, the way to define globals is to define them in exactly one .c file, then declare it as an extern in a header file which then gets included in the required .c files.
However, today I saw in a codebase global variable definition in the header file and I got into arguing, but he insisted it will work. Now, I had no idea why, so I created a small project to test it out real quick:
a.c
#include <stdio.h>
#include "a.h"
int main()
{
p1.x = 5;
p1.x = 4;
com = 6;
change();
printf("p1 = %d, %d\ncom = %d\n", p1.x, p1.y, com);
return 0;
}
b.c
#include "a.h"
void change(void)
{
p1.x = 7;
p1.y = 9;
com = 1;
}
a.h
typedef struct coord{
int x;
int y;
} coord;
coord p1;
int com;
void change(void);
Makefile
all:
gcc -c a.c -o a.o
gcc -c b.c -o b.o
gcc a.o b.o -o run.out
clean:
rm a.o b.o run.out
Output
p1 = 7, 9
com = 1
How is this working? Is this an artifact of the way I've set up the test? Is it that newer gcc has managed to catch this condition? Or is my interpretation of the whole thing completely wrong? Please help...

This relies on so called "common symbols" which are an extension to standard C's notion of tentative definitions (https://port70.net/~nsz/c/c11/n1570.html#6.9.2p2), except most UNIX linkers make it work across translation units too (and many even with shared dynamic libaries)
AFAIK, the feature has existed since pretty much forever and it had something to do with fortran compatibility/similarity.
It works by the compiler placing giving uninitialized (tentative) globals a special "common" category (shown in the nm utility as "C", which stands for "common").
Example of data symbol categories:
#!/bin/sh -eu
(
cat <<EOF
int common_symbol; //C
int zero_init_symbol = 0; //B
int data_init_symbol = 4; //D
const int const_symbol = 4; //R
EOF
) | gcc -xc - -c -o data_symbol_types.o
nm data_symbol_types.o
Output:
0000000000000004 C common_symbol
0000000000000000 R const_symbol
0000000000000000 D data_init_symbol
0000000000000000 B zero_init_symbol
Whenever a linker sees multiple redefinitions for a particular symbol, it usually generates linkers errors.
But when those redefinitions are in the common category, the linker will merge them into one.
Also, if there are N-1 common definitions for a particular symbol and one non-tentative definition (in the R,D, or B category), then all the definitions are merged into the one nontentative definition and also no error is generated.
In other cases you get symbol redefinition errors.
Although common symbols are widely supported, they aren't technically standard C and relying on them is theoretically undefined behavior (even though in practice it often works).
clang and tinycc, as far as I've noticed, do not generate common symbols (there you should get a redefinition error). On gcc, common symbol generation can be disabled with -fno-common.
(Ian Lance Taylor's serios on linkers has more info on common symbols and it also mentions how linkers even allow merging differently sized common symbols, using the largest size for the final object: https://www.airs.com/blog/archives/42 . I believe this weird trick was once used by libc's to some effect)

That program should not compile (well it should compile, but you'll have double definition errors in your linking phase) due to how the variables are defined in your header file.
A header file informs the compiler about external environment it normally cannog guess by itself, as external variables defined in other modules.
As your question deals with this, I'll try to explain the correct way to define a global variable in one module, and how to inform the compiler about it in other modules.
Let's say you have a module A.c with some variable defined in it:
A.c
int I_am_a_global_variable; /* you can even initialize it */
well, normally to make the compiler know when compiling other modules that you have that variable defined elsewhere, you need to say something like (the trick is in the extern keyword used to say that it is not defined here):
B.c
extern int I_am_a_global_variable; /* you cannot initialize it, as it is defined elsewhere */
As this is a property of the module A.c, we can write a A.h file, stating that somewhere else in the program, there's a variable named I_am_a_global_variable of type int, in order to be able to access it.
A.h
extern int I_am_a_global_variable; /* as above, you cannot initialize the variable here */
and, instead of declaring it in B.c, we can include the file A.h in B.c to ensure that the variable is declared as the author of B.c wanted to.
So now B.c is:
B.c
#include "A.h"
void some_function() {
/* ... */
I_am_a_global_variable = /* some complicated expression */;
}
this ensures that if the author of B.c decides to change the type or the declaration of the variable, he can do changing the file A.h and all the files that #include it should be recompiled (you can do this in the Makefile for example)
A.c
#include "A.h" /* extern int I_am_a_global_variable; */
int I_am_a_global_variable = 27;
In order to prevent errors, it is good that A.c also #includes the file A.h, so the declaration
extern int I_am_a_global_variable; /* as above, you cannot initialize the variable here */
and the final definition (that is included in A.c):
int I_am_a_global_variable = 23; /* I have initialized it to a non-default value to show how to do it */
are consistent between them (consider the author changes the type of I_am_a_global_variable to double and forgets to change the declaration in A.h, the compiler will complaint about non-matching declaration and definition, when compiling A.c (which now includes A.h).
Why I say that you will have double definition errors when linking?
Well, if you compile several modules with the statement (result of #includeing the file A.h in several modules) with the statement:
#include "A.h" /* this has an extern int I_am_a_global_variable; that informs the
* compiler that the variable is defined elsewhere, but see below */
int I_am_a_global_variable; /* here is _elsewhere_ :) */
then all those modules will have a global variable I_m_a_global_variable, initialized to 0, because the compiler defined it in every module (you don't say that the variable is defined elsewhere, you are stating it to declare and define it in this compilation unit) and when you link all the modules together you'll end with several definitions of a variable with the same name at several places, and the references from other modules using this variable will don't know which one is to be used.
The compiler doesn't know anything of other compilations for an application when it is compiling module A, so you need some means to tell it what is happening around. The same as you use function prototypes to indicate it that there's a function somewhere that takes some number of arguments of types A, B, C, etc. and returns a value of type Z, you need to tell it that there's a variable defined elsewhere that has type X, so all the accesses you do to it in this module will be compiled correctly.

VS Code shows error in cs50.h header file downloaded from GitHub [duplicate]

Just a simple program, but I keep getting this compiler error. I'm using MinGW for the compiler.
Here's the header file, point.h:
//type for a Cartesian point
typedef struct {
double x;
double y;
} Point;
Point create(double x, double y);
Point midpoint(Point p, Point q);
And here's point.c:
//This is the implementation of the point type
#include "point.h"
int main() {
return 0;
}
Point create(double x, double y) {
Point p;
p.x = x;
p.y = y;
return p;
}
Point midpoint(Point p, Point q) {
Point mid;
mid.x = (p.x + q.x) / 2;
mid.y = (p.y + q.y) / 2;
return mid;
}
And here's where the compiler issue comes in. I keep getting:
testpoint.c: undefined reference to 'create(double x, double y)'
While it is defined in point.c.
This is a separate file called testpoint.c:
#include "point.h"
#include <assert.h>
#include <stdio.h>
int main() {
double x = 1;
double y = 1;
Point p = create(x, y);
assert(p.x == 1);
return 0;
}
I'm at a loss as to what the issue could be.

How are you doing the compiling and linking? You'll need to specify both files, something like:
gcc testpoint.c point.c
...so that it knows to link the functions from both together. With the code as it's written right now, however, you'll then run into the opposite problem: multiple definitions of main. You'll need/want to eliminate one (undoubtedly the one in point.c).
In a larger program, you typically compile and link separately to avoid re-compiling anything that hasn't changed. You normally specify what needs to be done via a makefile, and use make to do the work. In this case you'd have something like this:
OBJS=testpoint.o point.o
testpoint.exe: $(OBJS)
gcc $(OJBS)
The first is just a macro for the names of the object files. You get it expanded with $(OBJS). The second is a rule to tell make 1) that the executable depends on the object files, and 2) telling it how to create the executable when/if it's out of date compared to an object file.
Most versions of make (including the one in MinGW I'm pretty sure) have a built-in "implicit rule" to tell them how to create an object file from a C source file. It normally looks roughly like this:
.c.o:
$(CC) -c $(CFLAGS) $<
This assumes the name of the C compiler is in a macro named CC (implicitly defined like CC=gcc) and allows you to specify any flags you care about in a macro named CFLAGS (e.g., CFLAGS=-O3 to turn on optimization) and $< is a special macro that expands to the name of the source file.
You typically store this in a file named Makefile, and to build your program, you just type make at the command line. It implicitly looks for a file named Makefile, and runs whatever rules it contains.
The good point of this is that make automatically looks at the timestamps on the files, so it will only re-compile the files that have changed since the last time you compiled them (i.e., files where the ".c" file has a more recent time-stamp than the matching ".o" file).
Also note that 1) there are lots of variations in how to use make when it comes to large projects, and 2) there are also lots of alternatives to make. I've only hit on the bare minimum of high points here.

I had this issue recently. In my case, I had my IDE set to choose which compiler (C or C++) to use on each file according to its extension, and I was trying to call a C function (i.e. from a .c file) from C++ code.
The .h file for the C function wasn't wrapped in this sort of guard:
#ifdef __cplusplus
extern "C" {
#endif
// all of your legacy C code here
#ifdef __cplusplus
}
#endif
I could've added that, but I didn't want to modify it, so I just included it in my C++ file like so:
extern "C" {
#include "legacy_C_header.h"
}
(Hat tip to UncaAlby for his clear explanation of the effect of extern "C".)

I think the problem is that when you're trying to compile testpoint.c, it includes point.h but it doesn't know about point.c. Since point.c has the definition for create, not having point.c will cause the compilation to fail.
I'm not familiar with MinGW, but you need to tell the compiler to look for point.c. For example with gcc you might do this:
gcc point.c testpoint.c
As others have pointed out, you also need to remove one of your main functions, since you can only have one.

Add the "extern" keyword to the function definitions in point.h

I saw here that this question
In c programming language, i keep getting this error
has been answered here so the thread seems closed for answers.
I disagree. It is different code.
The answer should be that we don't know what is in custom header file "functions.h".
Also, we don't know what are
MAPA m;
POSICAO heroi;
Are these functions, constants?
If these were some constants, one should expect #define in front of them, and no semicolon e.g.
#define MAPA m
#define POSICAO heroi
If You intended to prototype the function, since there's is semicolon behing, than You did not insert the parentheses ().
In that case MAPA and POSICAO are some custom-type functions, whose content should be determined in "Functions.h"
Also, there's a possibilty that You wanted to import the functions or variable or constant from some other directory, and in that case You're missing the word
extern MAPA m;

I had a similar problem running a bunch of .c files in a directory all linking to one header file with custom function prototypes.
I ran:
$gcc -Wall -Werror -Wextra -pedantic -std=gnu89 *.c
Getting these errors:
/usr/bin/ld: /tmp/ccovH4zH.o: in function `_puts': 3-puts.c:(.text+0x2f): undefined reference to `_putchar'
/usr/bin/ld: 3-puts.c:(.text+0x51): undefined reference to `_putchar'
/usr/bin/ld: /tmp/ccGeWRqI.o: in function `main': _putchar.c:(.text+0xe): undefined reference to `_putchar'
/usr/bin/ld: _putchar.c:(.text+0x18): undefined reference to `_putchar'
/usr/bin/ld: _putchar.c:(.text+0x22): undefined reference to `_putchar'
/usr/bin/ld: /tmp/ccGeWRqI.o:_putchar.c:(.text+0x2c): more undefined references to `_putchar' follow
collect2: error: ld returned 1 exit status
Note: All files were linked to the same header file with all the function declarations.
I manage to compile successfully after adding -c option to the gcc compiler like:
$gcc -Wall -Werror -Wextra -pedantic -std=gnu89 -c *.c
This run successfully.
Just in case anyone comes across the same.

What's the difference between declaring a function static and not including it in a header? [duplicate]

This question already has answers here:
What is a "static" function in C?
(11 answers)
Closed 2 years ago.
What's the difference between the following scenarios?
// some_file.c
#include "some_file.h" // doesn't declare some_func
int some_func(int i) {
return i * 5;
}
// ...
and
// some_file.c
#include "some_file.h" // doesn't declare some_func
static int some_func(int i) {
return i * 5;
}
// ...
If all static does to functions is restrict their accessibility to their file, then don't both scenarios mean some_func(int i) can only be accessed from some_file.c since in neither scenario is some_func(int i) put in a header file?

A static function is "local" to the .c file where it is declared in. So you can have another function (static or nor) in another .c file without having a name collision.
If you have a non static function in a .c file that is not declared in any header file, you cannot call this function from another .c file, but you also cannot have another function with the same name in another .c file because this would cause a name collision.
Conclusion: all purely local functions (functions that are only used inside a .c function such as local helper functions) should be declared static in order to prevent the pollution of the name space.
Example of correct usage:
file1.c
static void LocalHelper()
{
}
...
file2.c
static void LocalHelper()
{
}
...
Example of semi correct usage
file1.c
static LocalHelper() // function is local to file1.c
{
}
...
file2.c
void LocalHelper() // global functio
{
}
...
file3.c
void Foo()
{
LocalHelper(); // will call LocalHelper from file2.c
}
...
In this case the program will link correctly, even if LocalHelpershould have been static in file2.c
Example of incorrect usage
file1.c
LocalHelper() // global function
{
}
...
file2.c
void LocalHelper() // global function
{
}
...
file3.c
void Foo()
{
LocalHelper(); // which LocalHelper should be called?
}
...
In this last case we hava a nema collition and the program wil not even link.

The difference is that with a non-static function it can still be declared in some other translation unit (header files are irrelevant to this point) and called. A static function is simply not visible from any other translation unit.
It is even legal to declare a function inside another function:
foo.c:
void foo()
{
void bar();
bar();
}
bar.c:
void bar()
{ ... }

What's the difference between declaring a function static and not including it in a header?
Read more about the C programming language (e.g. Modern C), the C11 standard n1570. Read also about linkers and loaders.
On POSIX systems, notably Linux, read about dlopen(3), dlsym(3), etc...
Practically speaking:
If you declare a function static it stays invisible to other translation units (concretely your *.c source files, with the way you compile them: you could -at least in principle, even if it is confusing- compile foo.c twice with GCC on Linux: once as gcc -c -O -DOPTION=1 foo.c -o foo1.o and another time with gcc -c -O -DOPTION=2 -DWITHOUT_MAIN foo.c -o foo2.o and in some cases be able to link both foo1.o and foo2.o object files into a single executable foo using gcc foo1.o foo2.o -o foo)
If you don't declare it static, you could (even if it is poor taste) code in some other translation unit something like:
if (x > 2) {
extern int some_func(int); // extern is here for readability
return some_func(x);
}
In practice, I have the habit of naming all my C functions (including static ones) with a unique name, and I generally have a naming convention related to them (e.g. naming them with a common prefix, like GTK does). This makes debugging the program (with GDB) easier (since GDB has autocompletion function names).
At last, a good optimizing compiler could, and practically would often, inline calls to static functions whose body is known. With a recent GCC, compile your code with gcc -O2 -Wall. If you want to check how inlining happened, look into the produced assembler (using gcc -O2 -S -fverbose-asm).
On Linux, you can obtain using dlsym the address of a function which is not declared static.

How to merge statics for all instances of header-defined inline function?

Having a header that defines some static inline function that contains static variables in it, how to achieve merging of identical static local variables across all TUs that comprise final loadable module?. In a less abstract way:
/*
* inc.h
*/
#include <stdlib.h>
/*
* This function must be provided via header. No extra .c source
* is allowed for its definition.
*/
static inline void* getPtr() {
static void* p;
if (!p) {
p = malloc(16);
}
return p;
}
/*
* 1.c
*/
#include "inc.h"
void* foo1() {
return getPtr();
}
void* bar1() {
return getPtr();
}
/*
* 2.c
*/
#include "inc.h"
void* foo2() {
return getPtr();
}
void* bar2() {
return getPtr();
}
Platform is Linux, and this file set is built via:
$ clang -O2 -fPIC -shared 1.c 2.c
It is quite expected that both TUs receive own copies of getPtr.p. Though inside each TU getPtr.p is shared across all getPtr() instantiations. This can be confirmed by inspecting final loadable binary:
$ readelf -s --wide a.out | grep getPtr
32: 0000000000201030 8 OBJECT LOCAL DEFAULT 21 getPtr.p
34: 0000000000201038 8 OBJECT LOCAL DEFAULT 21 getPtr.p
At the same time I'm looking for a way of how to share getPtr.p across separate TU boundary. This vaguely resembles what happens with C++ template instantiations. And likely GRP_COMDAT would help me but I was not able to find any info about how to label my static var to be put into COMDAT.
Is there any attribute or other source-level (not a compiler option) way to achieve merging such objects?

If I understand correctly what you want, you can get this effect by simply declaring a global variable.
/*
* inc.h
*/
void* my_p;
static inline void* getPtr() {
if (!my_p) {
my_p = malloc(16);
}
return my_p;
}
This will use the same variable my_p for all instances of getPtr throughout the program (since it's global). And it is not necessary to have an explicit definition of my_p in any module. It will be initialized to NULL, which is just what you want. So nothing besides inc.h needs to change, and no additional .c file is needed.
Of course, you'll probably want to give my_p a name that is less likely to conflict with any identifier in the user's program. Maybe Sergios_include_file_p_for_getPtr or something of the sort.
This is actually an extension to standard C (mentioned in Annex J.5.11 in N2176), but it's provided by gcc and clang on most modern platforms. It's documented under the -fcommon compiler option (which is enabled by default). It's typically implemented by putting the variable in a common section, and the linker then merges all instances together, just as you suggest. But the code above shows how to access the feature without needing to use attributes or other obscure incantations.
If you want to be extra paranoid, you can declare my_p with __attribute__((common)) which will cause the variable to be treated in this way even if -fno-common is in effect. (Of course, that may cause trouble if -fno-common was being used for a reason...)

Static function and variable get exported in shared library

So far I've assumed that objects with static linkage (i.e. static functions and static variables) in C do not collide with other objects (of static or external linkage) in other compilation units (i.e. .c files) so I've used "short" names for internal helper functions rather than prefixing everything with the library name. Recently a user of my library experienced a crash due to a name collision with an exported function from another shared library. On investigation it turned out that several of my static functions are part of the symbol table of the shared library. Since it happens with several GCC major versions I assume I'm missing something (such a major bug would be noticed and fixed).
I managed to get it down to the following minimum example:
#include <stdbool.h>
#include <stdlib.h>
bool ext_func_a(void *param_a, char const *param_b, void *param_c);
bool ext_func_b(void *param_a);
static bool bool_a, bool_b;
static void parse_bool_var(char *doc, char const *var_name, bool *var)
{
char *var_obj = NULL;
if (!ext_func_a(doc, var_name, &var_obj)) {
return;
}
*var = ext_func_b(var_obj);
}
static void parse_config(void)
{
char *root_obj = getenv("FOO");
parse_bool_var(root_obj, "bool_a", &bool_a);
parse_bool_var(root_obj, "bool_b", &bool_b);
}
void libexample_init(void)
{
parse_config();
}
Both the static variable bool_a and the static function parse_bool_var are visible in the symbol table of the object file and the shared library:
$ gcc -Wall -Wextra -std=c11 -O2 -fPIC -c -o example.o example.c
$ objdump -t example.o|egrep 'parse_bool|bool_a'
0000000000000000 l O .bss 0000000000000001 bool_a
0000000000000000 l F .text 0000000000000050 parse_bool_var
$ gcc -shared -Wl,-soname,libexample.so.1 -o libexample.so.1.1 x.o -fPIC
$ nm libexample.so.1.1 |egrep 'parse_bool|bool_a'
0000000000200b79 b bool_a
0000000000000770 t parse_bool_var
I've dived into C11, Ulrich Drepper's "How to Write Shared Libraries" and a couple of other sources explaining visibility of symbols, but I'm still at a loss. Why are bool_a and parse_bool_var ending up in the dynamic symbol table even though they're declared static?

The lower case letter in the second column of nm output means they're local (if they were upper case, it would be a different story).
Those symbols won't conflict with other symbols of the same name and AFAIK, are basically there only for debugging purposes.
Local symbols won't go into the dynamic symbol table (printable with nm -D but only in shared libraries) either and they are strippable along with exported symbols (upper case letters in the second column of nm output) that aren't dynamic.
(As you will have learned from Drepper's How to Write Shared Libraries, you can control visibility with -fvisibility=(default|hidden) (shouldn't use protected) and with visibility attributes.)