Detect undefined symbols in C header file

Detect undefined symbols in C header file - c

Suposse I coded a C library which provides a bunch of "public" functions, declared in a mylib.h header file. Those functions are supposedly implemented in (say) a mylib.c file which is compiled to a (say) static lib mylib.c -> mylib.o -> mylib.a.
Is there some way to detect that I forgot to provide the implementation of some declared function in mylib.h? (Yes, I know about unit testing, good practices, etc - and, yes, I understand the meaning of a plain function declaration in C).
Suppose mylib.h declares a void func1(); and this function was not coded in the provided library. This will trigger an error only if the linker needs to use that function. Otherwise, it will compile ok and even without warnings - AFAIK. Is there a way (perhaps compiler dependent) to trigger a warning for declared but not implemented functions, or there is any other way to deal with this issue?
BTW: nm -u lists not all undefined declared functions, but only those "used" by the library, i.e., those functions that will trigger an error in the linking phase if not declared somewhere. (Which makes sense, the library object file does not know about header files, of course.)

Basically, the most reliable way is to have a program (or possibly a series of programs) which formally exercise each and every one of the functions. If one of the programs fails to link because of a missing symbol, you've goofed.
I suppose you could try to do something by editing a copy of the header into a source file (as in, file ending .c), converting the function declarations into dummy function definitions:
Original:
extern int somefunc(void);
Revised:
extern int somefunc(void){}
Then compile the modified source with minimum warnings - and ignore anything to do with "function that is supposed to return a value doesn't". Then compare the defined symbols in the object file from the revised source with the defined symbols in the library (using nm -g on Unix-like systems). Anything present in the object file that isn't present in the library is missing and should be supplied.
Note: if your header includes other headers of your own which define functions, you need to process all of those. If your header includes standard headers such as <stdio.h>, then clearly you won't be defining functions such as fopen() or printf() in the ordinary course of events. So, choose the headers you reprocess into source code carefully.

There's no easy way.
For example, you can analyse the output of clang -Xclang -ast-print-xml or gcc-xml and filter out declarations with no implementations for a given .h file.

You could grep for signatures of exported function in both .h and .c, and compare the lists.
Use wc -l for counting matches, Both numbers should be equal.
Another thought, just came to my mind. It is ihmo not possible to handle it using compiler. it is not always the case, that function declares in mylib.h is implemented in mylib.c

Is there some way to detect that I forgot to provide the implementation of some declared function in mylib.h?
Write the implementation first, then worry about header contents -- because that way, it can be flagged.

Related

Why is there a need to link with `printf.o` if it's already defined inside `stdio.h`?

As far as I understand when we include stdio.h the entire file is copied into the text of our program.(basically we prepend it) Then when we do a function call for printf() why does it have to be linked first?
Is it that stdio.h contains just the definitions of those functions and the compiler finds the compiled executable object file for the function that we invoke for example printf().
I've read about this in a few places but still kinda not clear.

Header files like stdio.h generally just contain a declaration that defines the name of the function, the types of its arguments, and its return value. This is enough for the compiler to generate calls to the function, but it is not a definition. The actual code that implements the function is going to be in a library (with an extension like .a or .o or .lib).

Nope. printf() is included in libc.so (this is where the printf() function resides), which is the C standard library. The compiler includes automatically -lc as option, when it calls the linker, so you don't need to add that option. Only in case you link your object files calling the linker ld directly, is when you need to include the option yourself (and some other files that form the C runtime, the linker doesn't know that you are linking a C source program, so it doesn't know what files to link with your modules, but the compiler does) See the documentation of the compiler you use, or just use the -v option when building the executable, to see the command line the compiler uses to call the linker, and you will see all the objects and libraries the compiler requires for every C program.

What is the difference between stdio.c and stdio.h?

Couldn't stdio functions and variables be defined in header files without having to use .c files.
If not, what are .c files used for?

The functions defined in the header file have to be implemented. The .c file contains the implementation, though these have already been compiled into a static or shared library that your compiler can use.
The header file should contain a minimal description of the function to save time when compiling. If it included the entire source it'd force the compiler to rebuild it each and every time you compile which is really wasteful since that source never changes.
In effect, the header file serves as a cheat sheet on how to interact with the already compiled library.
The reason the .c files are provided is primarily for debugging, so your debugger can step through in your debug build and show you source instead of raw machine code. In rare cases you may want to look at the implementation of a particular function in order to better understand it, or in even more rare cases, identify a bug. They're not actually used to compile your program.
In your code you should only ever reference the header file version, the .h via an #include directive.

stdio.h is a standard header, required to be provided by every conforming hosted C implementation. It declares, but does not define, a number of entities, mostly library functions like putchar and scanf.
stdio.c, if it exists, is likely to be a C source file that defines the functions declared in stdio.h. There is no requirement that an implementation must make it available. It might not even exist; for example the implementations of the functions declared in stdio.h might appear in multiple *.c files.
The declaration of putchar is:
int putchar(int c);
and that's all the compiler needs to know when it sees a call to putchar in your program. The code that implements putchar is typically provided as machine code, and the linker's job is to resolve your putchar() call so it ends up invoking that code. putchar() might not even be written in C (though it probably is).
An executable program can be built from multiple *.c source files. One and only one copy of the code that implements putchar is needed for an entire program. If the implementation of putchar were in the header file, then it would be included in each separately compiled source file, creating conflicts and, at best, wasting space. The code that implements putchar() (and all the other functions in the library) only needs to be compiled once.

The .c files has specific function for any aim. For example stdio.c files has standart input-output functions to use within C program. In stdio.h header files has function prototypes for all stdio.c functions, all defines, all macros etc. When you #include <stdio.h> in your main code.c file your main code assumes there is a " int printf(const char *format, ...)" function. Returns int value and you can pass argument ..... etc. When you call printf() function actually you use stdio.c files..

There are languages where if you want to make use of something someone else has written, you say something like
import module
and that takes care of everything.
C is not one of those languages.
You could put "library" source code in a file, and then use #include to pull it in wherever you needed it. But this wouldn't work at all, for two reasons:
If you used #include to pull it in from two different source files, and then linked the two resulting object files together, everything in the "library" would be defined twice.
You might not want to deliver your "library" code as source; you might prefer to deliver it in compiled, object form.

How to correctly include own libraries in function files and project files

I got stuck trying to do Exercise 8-3 of K&R, the goal of the exercise is to rewrite some functions of stdio.h such as fopen, fclose, fillbuf and flushbuf
here's how my source files are organized:
stdio.h: contains types and macro definitions, and the declarations of some functions proper to the library. all content of the file is enclosed between #ifndef #endif lines as follows:
#ifndef STDIO_H
#define STDIO_H
/* content of stdio.h */
#endif
myfunction.c: I have a .c file per function, each file has a #include "stdio.h" line to load all needed types definitions.
main.c: where I have code to test my functions, the main.c also has a #include "stdio.h" line.
my problem is the following: when I try to compile all my files using gcc I run to the error:
multiple definition of `_iob'
on every one of my function files where my stdio.h is included, (_iob is a variable I only defined inside my stdio.h)...why is this happening ? I though the #ifndef line was to specifically prevent such errors.
more generally:
How would you go about making your own header files and library/function files and using them in your projects ?
Is there a way to make the linker figure out the position of my functions just by including the header file, the same way it does for standard functions ?

Please become aware of the difference between a library and its header files.
A library is a (collection of) binary machine code (with some additional meta-data, e.g. relocation directives to the linker).
For example, on my Linux system, dynamic libraries are generally shared objects (e.g. /usr/lib/x86_64-linux-gnu/libgmp.so) and it makes absolutely no sense to try some preprocessor directive like #include "libgmp.so" //wrong.
But a library has some API. That API is given by some documentation and by some header file(s), e.g. gmp.h and you should #include "gmp.h" in any C code (your C translation unit) which uses it.
myfunction.c: I have a .c file per function
Having one file per function is often poor taste. You generally can group related functions. For example, in your case, you probably want to define your myfopen and myfclose functions in the same myopenclose.c translation unit (even if you don't have to) because these two functions are intimately related. As a rule of thumb, I prefer having source files of one or a few thousand lines each (but that is really a matter of taste, and some people like having many small files).
Remember that what the compiler really sees is the preprocessed form of code. Consider asking your compiler to produce that form (e.g. from foo.c you can get its preprocessed form foo.i with gcc -C -E -Wall foo.c > foo.i on my Linux desktop) and look into it. Try that on your own files (e.g. your myopenclose.c if you have one).
If you have many small files, the compiler is probably including the same headers in each of them, and these included declarations gets compiled every time. BTW, notice that gcc is only a driver program. Use it with -v flag. You'll see that it is running cc1 (the C compiler proper), as (the assembler), ld (the linker), etc.
I run to the error:
multiple definition of `_iob'
on every one of my function files where my stdio.h is included, (_iob is a variable I only defined inside my stdio.h).
You probably should declare extern your _iob global variable in your stdio.h and define a global _iob in only one implementation file (perhaps myopenclose.c, if it is relevant) of your library.
Don't confuse definition and declaration (of variables, functions, types, etc.). Spend some time reading the C11 standard n1570. These words are defined there. As a rule of thumb, declarations should go into header .h files, definitions (of variables and functions) in implementation .c files (of course details are much more complex, you often but not always define types and struct in header files).
I strongly recommend using some Linux distribution (it is very developer- and student- friendly) and studying the source code of some existing free software C standard library (like musl-libc, whose code is quite readable). More generally, study the source code of existing free software projects (e.g. on github). They will inspire you.
Is there a way to make the linker figure out the position of my functions just by including the header file, the same way it does for standard functions ?
This shows a lot of confusion (the above question does not make any sense). Read more about compilers (your cc1 program -started by gcc- is translating a .c file into some object file .o) and about linkers (your ld, generally started by gcc, is agglomerating several object files, processing relocations inside them, and producing an ELF library or an executable). The preprocessing (e.g. of #include directive) is done at compile time by cc1. The linker cannot see any header files (it only deals with object files or libraries).

If you rewrite some of the system declarations and functions, while at the same time including the system declarations, you can expect some collisions.
Header files (.h) contain code (usually only declarations) and the mechanism you describe (#ifndef STDIO_H) is to prevent multiple inclusions of the same header file - mainly because another include file (header) that has already been loaded might also include it. That result in the same kind of collision as you had.
In C, you could, for instance
make a new header file that contain your own declarations + the stdio ones that don't collide with yours
use the stdio declarations, and only write new functions that use the same structures, defines, enums etc... as stdio
rewrite the necessary declarations and code that allows you not to include the system headers anymore
use another naming convention, like my_iob in both your header file, and in your code.
The two last ones are probably the best in your case, since you still have some collisions coming from a header file.
For instance, your code might not include stdio.h, but another header file you include might do it, indirectly...

Header naming convention

From
How can I define a C function in one file, then call it from another?
Say I define a function in the file func1.c, and I want to call it from
the file call.c, how would I accomplish this?
and the answer
You would put a declaration for the function in the file func1.h, and add
#include "func1.h" in call.c. Then you would compile or link func1.c
and call.c together (details depend on which C system).
My question is does the name of the header file have to be func1.h, as in <name-of-c-file>.h, or is this just a best practice? Please provide link for reference if possible.

In c, you don't have to call your files anything at all. As such, you can call your header mylongheadername.h, while you call your source file a.c and have a function in it called justArandomFunctionName.
However, you should be aware that your source file needs to include your header file. Generally there is a strong link between headers and source files, so that's the reason this is just about always done in this way. However, the following is completely valid:
a.c : func1 implementation
func2 implementation
b.c : func3 implementation
func4 implementation
c.h : func1 declaration
func3 declaration
d.h : func2 declaration
func4 declaration
However, there are some problems to this approach that might occur when using files set up like this (which mean you have to put extra work into structuring these files right) and it's just poor practice. But, the way one uses header files is just convention and barely any of it is enforced by the language.
Then there is the question how this can work if the header file does not know where the function is defined. The idea of this is that it doesn't need to know.
Basically, all your header does is tell your compiler that somewhere you defined a function that fits a certain profile (what name, what parameters, what return type). When your compiler reads this, basically all it does is mix all this info into a fancy name, which it will then insert into the file that is calling, which means it still doesn't do anything. The next step you need to take is to use the linker to turn the compiled versions of each of your files into a single executable. This does a number of things, but one of the most important ones is that it resolves all those fancy names the compiler cooked up. However, the way that your linker does this, is that it just reads all the compiled versions of your files and matches the definition of functions to their actual location in other code. Because it just handles all you have at the same time, it doesn't matter where your functions were defined and the header file never needs to know this.

No, header files do not need to match any corresponding C source file. It is as you've said just convention.

Compiling with header files

Why do I have to specifically compile a C source file with:
gcc prog.c -lm
even when I have already included the specific header file with:
#include <math.h>

The #include file tells the compiler how a function looks like, as in what type it returns, how many parameters of what types it takes, but it doesn't tell the compiler the contents.
The -lm flag includes that actual math library which contains the code for the functions to be called.
It works the same way with printf(), fread() and other standard functions. When you include stdio.h, you don't actually include the code of the function but the definitions. Because the C library is implicitly linked without you having to do anything about that, you don't notice it.

Because you need to inform the compiler which math library to link with, nothing to do with the math.h inclusion.

Similarly to your own code, which should have header files (.h) for function declarations and source files (.c) for function definitions, the code for the math library is in two parts. The header file, which you include, contains the function declarations:
double sqrt(double n);
However, it doesn't contain anything about how these functions work. This code is in a separate file which you have to link in, similarly to how you link different source files to create an application.

Because in C, there is technically absoultely no connection between the header file and the library. There can be more header files than libraries, or the other way round. It's just a matter of convention (and of course it makes some sense) to have a 1:1 relation in most cases.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight