Linking libraries built with different preprocessor flags or C standards - c

Scenario 1:
I want to link an new library (libA) into my program, libA was built using gcc with -std=gnu99 flag, while the current libraries of my program were built without that option (and let's assume gcc uses -std=gnu89 by default).
Scenario 2:
libB was built with some preprocessor flags like "-D_XOPEN_SOURCE -D_XOPEN_SOURCE_EXTENDED" to enable XPG4 features, e.g. msg_control member of struct msghdr. While libC wasn't built without those preprocessor flags, then it's linked against libB.
Is it wrong to link libraries built with different preprocessor flags or C standards ?
My concern is mainly about structure definitions mismatch.
Thanks.

Scenario 1 is completely safe for you. std= option in GCC checks code for compatibility with standard, but has nothing to do with ABI, so you may feel free to combine precompiled code with different std options.
Scenario 2 may be unsafe. I will put here just one simple example, real cases may be much more tricky.
Consider, that you have some function, like:
#ifdef MYDEF
int foo(int x) { ... }
#else
int foo(float x) { ... }
#endif
And you compile a.o with -DMYDEF and b.o without, and function bar from a.o calls function foo in b.o. Next you link it together and everything seems to be fine. Then everything fails in runtime and you may have very hard time debugging why are you passing int from one module, while expecting float on callee side.
Some more tricky cases may include conditionally defined structure fields, calling conventions, global variable sizes.
P.S. Assuming all your sources are written in the same language, varying only std options and macro definitions. Combining C and C++ code is really tricky sometimes, agree with Mikhail.

The few times I have encountered structure definition mismatch was when combining C and C++ code, in these cases there was a clear warning that something terrible was happening.
Something like
/usr/lib/gcc/i586-suse-linux/4.3/../../../../i586-suse-linux/bin/ld: Warning: size of symbol `tree' changed from 324 in /tmp/ccvx8fpJ.o to 328 in gpu.o
See that question.

Related

GCC how to stop false positive warning implicit-function-declaration for functions in ROM?

I want to get rid of all implicit-function-declaration warnings in my codebase. But there is a problem because some functions are
programmed into the microcontroller ROM at the factory and during linking a linker script provides only the function address. These functions are called by code in the SDK.
During compilation gcc of course emits the warning implicit-function-declaration. How can I get rid of this warning?
To be clear I understand why the warning is there and what does it mean. But in this particular case the developers of SDK guarantee that the code will work with implicit rules (i.e. implicit function takes only ints and returns an int). So this warning is a false positive.
This is gnu-C-99 only, no c++.
Ideas:
Guess the argument types, write a prototype in a header and include that?
Tell gcc to treat such functions as false positive with some gcc attribute?
You can either create a prototype function in a header, or suppress the warnings with the following:
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wimplicit-function-declaration"
/* line where GCC complains about implicit function declaration */
#pragma GCC diagnostic pop
Write a small program that generates a header file romfunctions.h from the linker script, with a line like this
int rom_function();
for each symbol defined by the ROM. Run this program from your Makefiles. Change all of the files that use these functions to include romfunctions.h. This way, if the linker script changes, you don't have to update the header file by hand.
Because most of my programming expertise was acquired by self-study, I intentionally have become somewhat anal about resolving non-fatal warnings, specifically to avoid picking up bad coding habits. But, this has revealed to me that such bad coding habits are quite common, even from formally trained programmers. In particular, for someone like me who is also anal about NOT using MS Windows, my self-study of so-called platform-independent code such as OpenGL and Vulkan has revealed a WORLD of bad coding habits, particularly as I examine code written assuming the student was using Visual Studio and a Windows C/C++ compiler.
Recently, I encountered NUMEROUS non-fatal warnings as I designed an Ubuntu Qt Console implementation of an online example of how to use SPIR-V shaders with OpenGL. I finally threw in the towel and added the following lines to my qmake .PRO file to get rid of the non-fatal-warnings (after, first, studying each one and convincing myself it could be safely ignored) :
QMAKE_CFLAGS += -Wno-implicit-function-declaration
-Wno-address-of-packed-member
[Completely written due to commends]
You are compiling the vendor SDK with your own code. This is not typically what you want to do.
What you do is you build their SDK files with gcc -c -Wno-implicit-function-declaration and and your own files with gcc -c or possibly gcc -o output all-your-c-files all-their-o-files.
C does not require that declarations be prototypes, so you can get rid of the problem (which should be a hard error, not a warning, since implicit declarations are not valid C) by using a non-prototype declaration, which requires only knowing the return type. For example:
int foo();
Since "implicit declarations" were historically treated as returning int, you can simply use int for all of them.
If you are using C program, use
#include <stdio.h>

How is "__builtin_va_list" implemented?

I want to delve into the implementation of function "printf" in C on macOS. "printf" uses the <stdarg.h> header file. I open the <stdarg.h> file and find that va_list is just a macro.
So, I am really curious about how the __builtin_va_list is implemented? I know it is compiler-specific. Where can I find the definition of the __builtin_va_list? Should I download the source code of clang compiler?
So, I am really curious about how the __builtin_va_list is implemented?
__builtin_va_list is implemented inside the GCC compiler (or the Clang/LLVM one). So you should study the GCC compiler source code to understand details.
Look into gcc/builtins.def & gcc/builtins.c for more.
I am less familiar with Clang, which implements the same builtin.
But both GCC & Clang are open source or free software. They are complex beasts (several millions lines of code each), so you could need years of work to understand them.
Be aware that the ABI of your compiler matters. Look for example into X86 psABI for more details.
BTW, Grady Player commented:
Pops the correct number of bytes off of the stack for each of those tokens...
Unfortunately, today it is much more complex than that. On current processors and ABIs the calling conventions do use processor registers to pass some arguments (and the evil is in the details).
Should I download the source code of clang compiler?
Yes, and you also need to allocate several years of work to understand the details.
A few years ago, I did write some tutorial slides and links to external documentation regarding GCC implementation, see my GCC MELT documentation page (a bit rotten).
In Clang 9, this is implemented in
clang\lib\AST\ASTContext.cpp
call graph:
getVaListTagDecl
=>getBuiltinVaListDecl
=>CreateVaListDecl
=>Create***BuiltinVaListDecl
for example:
=>CreateCharPtrBuiltinVaListDecl
=>CreateCharPtrNamedVaListDecl
=>buildImplicitTypedef
When there is __builtin_va_list in the preprocessed source, the compiler calls getVaListTagDecl to build a TypedefDecl AST node and insert it into the AST, the typedef doesn't exist in any source code, it is generated dynamically during build, as if there is such in the source:
typedef *** __builtin_va_list;
//for example
typedef char* __builtin_va_list;
This answer, for clang, just show how I find implementation of a builtin function.
I'm interested in implementation of std::atomic<T>. If T is not a trivial type, clang use a lock to guard its atomicity. Look this answer first, I find a builtin function named __c11_atomic_store. The question is, how this builtin function implemented in clang?
Searching Builtin in clang codebase, find in clang/Basic/Builtins.def:
// Some of our atomics builtins are handled by AtomicExpr rather than
// as normal builtin CallExprs. This macro is used for such builtins.
#ifndef ATOMIC_BUILTIN
#define ATOMIC_BUILTIN(ID, TYPE, ATTRS) BUILTIN(ID, TYPE, ATTRS)
#endif
// C11 _Atomic operations for <stdatomic.h>.
ATOMIC_BUILTIN(__c11_atomic_init, "v.", "t")
ATOMIC_BUILTIN(__c11_atomic_load, "v.", "t")
ATOMIC_BUILTIN(__c11_atomic_store, "v.", "t")
ATOMIC_BUILTIN(__c11_atomic_exchange, "v.", "t")
...
The keyword are AtomicExpr and CallExpr. Then I check every caller of AtomicExpr's constructor, but doesn't find any useful information. So I guess, maybe in parse phase, if parser match an builtin function calling, it will construct an CallExpr to AST with builtin flag. In code generate phase, it will emit the implementation.
Check CodeGen, I find the answer in lib/CodeGen/CGBuiltin.cpp and CodeGen/CGAtomic.cpp.
You can check CodeGenFunction::EmitVAArg, I holp that would be useful for you.

Using standard function name in C

I am compiling one program called nauty. This program uses a canonical function name getline which is also part of the standard GNU C library.
Is it possible to tell GCC at compile time to use this program defined function?
One solution:
Now you have declaration of the function in some application .h file something like:
int getline(...); // the custon getline
Change that to:
int application_getline(...); // the custon getline
#define getline application_getline
I think that should do it. It will also fix the .c file where the function is defined, assuming it includes that .h file.
Also, use grep or "find in files" of editor to make sure that every place where that macro takes effect, it will not cause trouble.
Important: in every file, make sure that .h file included after any standard headers which may use getline symbol. You do not want that macro to take effect in those...
Note: this is an ugly hack. Then again, almost everything involving C pre-processor macros can be considered an ugly hack, by some criteria ;). Then again, getting existing incompatible code bases to co-operate and work together is often a case where a hack is acceptable, especially if long term maintenance is not a concern.
Note2: As per this answer and as pointed out in a comment, this is undefined behavior by C standard. Keep this in mind, if intention is to maintain the software for longer then just getting a working executable binary one time. But I added a better solution.
Note that you may trigger undefined behavior if the GCC header where standard getline is defined is actually used in your code. These are the relevant information sources (emphasis mine):
The libc manual:
1.3.3 Reserved Names
The names of all library types, macros, variables and functions that come from the ISO C standard are reserved unconditionally; your program may not redefine these names. All other library names are reserved if your program explicitly includes the header file that defines or declares them. There are several reasons for these restrictions:
[...]
and the C99 draft standard (N1256):
7.1.3 Reserved identifiers
1
Each header declares or defines all identifiers listed in its associated subclause, and
optionally declares or defines identifiers listed in its associated future library directions subclause and identifiers which are always reserved either for any use or for use as file scope identifiers.
[...]
2
No other identifiers are reserved. If the program declares or defines an identifier in a context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved identifier as a macro name, the behavior is undefined.
3
If the program removes (with #undef) any macro definition of an identifier in the first
group listed above, the behavior is undefined.
Thus even the macro trick suggested in another post will invoke undefined behavior if you include the header of getline in your code.
Unfortunately, in this case the only safe bet is to manually rename all getline invocations.
C demands unique function names.
but you can use -fno-builtin or -ffreestanding gcc flags.
see description about this flags in gcc man page.
A common approach is to use prefixes which form some sort of namespace. Sometimes you can see macros used for this to make changing the namespace name easier, e.g.
#define MYAPP(f) myapp_##f
Which is then used like
int MYAPP(add)(int a, int b) {
return a + b;
}
This defines a function myapp_add which you can also invoke like
MYAPP(add)(3, 5);
This standards compliance issue started to bug me, so I did a bit of experimenting. Here's a 2nd answer, which is possibly better then the currently accepted answer of mine.
First, solution:
Just define macro _XOPEN_SOURCE with value 699, by adding this to compiler command line options
-D_XOPEN_SOURCE=699
How exactly, that depends on applications build system, but one probably working way would be to define CFLAGS environment variable, and see if it takes effect when rebuilding:
export CFLAGS="-D_XOPEN_SOURCE=699"
Other alternative would be to add #define _XOPEN_SOURCE 699 before includes in every .c file of the application, in case it uses some esoteric build system and you can't get it added to compile options, but doing it from command line is by far preferable.
Then some explanation:
Man page of getline specifies, that getline is defined only under certain standards, such as if _XOPEN_SOURCE>=700. So, by defining a smaller value before including the relevant file, we exclude the library declaration. More information about these feature-test macros is found in GNU libc manual.
I expected there to be some linker issues too, but there weren't, and my investigation resulted this question here. To summarize, linker will prefer symbol from linked object files (at least with gcc), and will only look at dynamic libraries if it has not found symbol otherwise. So, since getline is not ISO C symbol, GNU libc documentation quoted in this answer seems to imply, that after using the _XOPEN_SOURCE trick of this answer, it's ok to use it in an application. Still, beware of other libraries using the POSIX getline and ending up calling application's function (probably with different parameters, resulting in undefined behaviour, probably a crash).
Here is a neat solution to your problem. The trick is LD_PRELOAD.
I have done the similar thing in one of my question post.See the following.
Hack the standard function in library and call the native library function afterwards
You can defined the getline() in the separate file. This will make the design clean too. Now, compile that c file;
$gcc -c -g -fPIC <file.c>.
This will create the file.o. Now, make the shared object of it.
-g for debugging.
-fPIC for position independent code. This will help to save the RAM SIZE. The text segment will be shared, if you specify the -fPIC option.
$gcc -shared libfile.so file.o
Now, link your main file with this shared object.
gcc -g main.c -o main.out -lfile
while executing, use the LD_PRELOAD, this will use your library instead of the native API.
$LD_PRELOAD=<path to libfile.so>/libfile.so ./main.out
If you like my answer,then please appreciate. I have done the similar kind of stuff, in my previous post Hack the standard function in library and call the native library function afterwards .

Macros giving problems with dladdr()

I have implemented tracing behavior using the -finstrument-functions option of gcc and this (simplified) code:
void __cyg_profile_func_enter(void *this_fn, void *call_site)
{
Dl_info di;
if(dladdr(this_fn, &di))
printf("entered %s\n", (di.dli_sname?di_dli_sname:"<unknown>"));
}
This works great, except for one thing: macros are processed as well, but the function prints the information of the function which contains the macro.
So functions containing macros have their information printed multiple times (which is of course undesired).
Is there anything to detect that a macro is being processed? Or is is possible to turn off instrumenting macros at all?
PS Same problems occur with sizeof()
Edit: To clarify: I am looking for a solution to prevent macros messing with the instrumented functions (which they should not be doing). Not for methods to trace macros, functions and/or other things.
Macros are expanded inline by the preprocessor, therefore there is no way to distinguish between a function called directly from the code and called from a macro.
The only possible way around this would be to have your macros set a global flag, which your tracing function will check.
This is of course less than foolproof, since any calls done by a function called from a macro will also appear the same way.
If you really want to dig into it you can see my response to breakdown c++ code size. C++ templates are really just more formal macros, so this may work for you.
It also may not, since LINE and FILE within a macro correspond to the caller.
edit
from my comment on this:
$ gcc -E foo.c | gcc -x c-cpp-output -c -finstrument-functions - -o foo.o
preprocess piped into gcc expecting preprocessed input on standard input

Why do you have to link the math library in C?

If I include <stdlib.h> or <stdio.h> in a C program, I don't have to link these when compiling, but I do have to link to <math.h>, using -lm with GCC, for example:
gcc test.c -o test -lm
What is the reason for this? Why do I have to explicitly link the math library, but not the other libraries?
The functions in stdlib.h and stdio.h have implementations in libc.so (or libc.a for static linking), which is linked into your executable by default (as if -lc were specified). GCC can be instructed to avoid this automatic link with the -nostdlib or -nodefaultlibs options.
The math functions in math.h have implementations in libm.so (or libm.a for static linking), and libm is not linked in by default. There are historical reasons for this libm/libc split, none of them very convincing.
Interestingly, the C++ runtime libstdc++ requires libm, so if you compile a C++ program with GCC (g++), you will automatically get libm linked in.
Remember that C is an old language and that FPUs are a relatively recent phenomenon. I first saw C on 8-bit processors where it was a lot of work to do even 32-bit integer arithmetic. Many of these implementations didn't even have a floating point math library available!
Even on the first 68000 machines (Mac, Atari ST, Amiga), floating point coprocessors were often expensive add-ons.
To do all that floating point math, you needed a pretty sizable library. And the math was going to be slow. So you rarely used floats. You tried to do everything with integers or scaled integers. When you had to include math.h, you gritted your teeth. Often, you'd write your own approximations and lookup tables to avoid it.
Trade-offs existed for a long time. Sometimes there were competing math packages called "fastmath" or such. What's the best solution for math? Really accurate but slow stuff? Inaccurate but fast? Big tables for trig functions? It wasn't until coprocessors were guaranteed to be in the computer that most implementations became obvious. I imagine that there's some programmer out there somewhere right now, working on an embedded chip, trying to decide whether to bring in the math library to handle some math problem.
That's why math wasn't standard. Many or maybe most programs didn't use a single float. If FPUs had always been around and floats and doubles were always cheap to operate on, no doubt there would have been a "stdmath".
Because of ridiculous historical practice that nobody is willing to fix. Consolidating all of the functions required by C and POSIX into a single library file would not only avoid this question getting asked over and over, but would also save a significant amount of time and memory when dynamic linking, since each .so file linked requires the filesystem operations to locate and find it, and a few pages for its static variables, relocations, etc.
An implementation where all functions are in one library and the -lm, -lpthread, -lrt, etc. options are all no-ops (or link to empty .a files) is perfectly POSIX conformant and certainly preferable.
Note: I'm talking about POSIX because C itself does not specify anything about how the compiler is invoked. Thus you can just treat gcc -std=c99 -lm as the implementation-specific way the compiler must be invoked for conformant behavior.
Because time() and some other functions are builtin defined in the C library (libc) itself and GCC always links to libc unless you use the -ffreestanding compile option. However math functions live in libm which is not implicitly linked by gcc.
An explanation is given here:
So if your program is using math functions and including math.h, then you need to explicitly link the math library by passing the -lm flag. The reason for this particular separation is that mathematicians are very picky about the way their math is being computed and they may want to use their own implementation of the math functions instead of the standard implementation. If the math functions were lumped into libc.a it wouldn't be possible to do that.
[Edit]
I'm not sure I agree with this, though. If you have a library which provides, say, sqrt(), and you pass it before the standard library, a Unix linker will take your version, right?
There's a thorough discussion of linking to external libraries in An Introduction to GCC - Linking with external libraries. If a library is a member of the standard libraries (like stdio), then you don't need to specify to the compiler (really the linker) to link them.
After reading some of the other answers and comments, I think the libc.a reference and the libm reference that it links to both have a lot to say about why the two are separate.
Note that many of the functions in 'libm.a' (the math library) are defined in 'math.h' but are not present in libc.a. Some are, which may get confusing, but the rule of thumb is this--the C library contains those functions that ANSI dictates must exist, so that you don't need the -lm if you only use ANSI functions. In contrast, `libm.a' contains more functions and supports additional functionality such as the matherr call-back and compliance to several alternative standards of behavior in case of FP errors. See section libm, for more details.
As ephemient said, the C library libc is linked by default and this library contains the implementations of stdlib.h, stdio.h and several other standard header files. Just to add to it, according to "An Introduction to GCC" the linker command for a basic "Hello World" program in C is as below:
ld -dynamic-linker /lib/ld-linux.so.2 /usr/lib/crt1.o
/usr/lib/crti.o /usr/libgcc-lib /i686/3.3.1/crtbegin.o
-L/usr/lib/gcc-lib/i686/3.3.1 hello.o -lgcc -lgcc_eh -lc
-lgcc -lgcc_eh /usr/lib/gcc-lib/i686/3.3.1/crtend.o /usr/lib/crtn.o
Notice the option -lc in the third line that links the C library.
If I put stdlib.h or stdio.h, I don't have to link those but I have to link when I compile:
stdlib.h, stdio.h are the header files. You include them for your convenience. They only forecast what symbols will become available if you link in the proper library. The implementations are in the library files, that's where the functions really live.
Including math.h is only the first step to gaining access to all the math functions.
Also, you don't have to link against libm if you don't use it's functions, even if you do a #include <math.h> which is only an informational step for you, for the compiler about the symbols.
stdlib.h, stdio.h refer to functions available in libc, which happens to be always linked in so that the user doesn't have to do it himself.
It's a bug. You shouldn't have to explicitly specify -lm any more. Perhaps if enough people complain about it, it will be fixed. (I don't seriously believe this, as the maintainers who are perpetuating the distinction are evidently very stubborn, but I can hope.)
I think it's kind of arbitrary. You have to draw a line somewhere (which libraries are default and which need to be specified).
It gives you the opportunity to replace it with a different one that has the same functions, but I don't think it's very common to do so.
I think GCC does this to maintain backwards compatibility with the original cc executable. My guess for why cc does this is because of build time -- cc was written for machines with far less power than we have now. A lot of programs don't have any floating-point math, and they probably took every library that wasn't commonly used out of the default. I'm guessing that the build time of the Unix OS and the tools that go along with it were the driving force.
I would guess that it is a way to make applications which don't use it at all perform slightly better. Here's my thinking on this.
x86 OSes (and I imagine others) need to store FPU state on context switch. However, most OSes only bother to save/restore this state after the app attempts to use the FPU for the first time.
In addition to this, there is probably some basic code in the math library which will set the FPU to a sane base state when the library is loaded.
So, if you don't link in any math code at all, none of this will happen, therefore the OS doesn't have to save/restore any FPU state at all, making context switches slightly more efficient.
Just a guess though.
The same base premise still applies to non-FPU cases (the premise being that it was to make apps which didn't make use libm perform slightly better).
For example, if there is a soft-FPU which was likely in the early days of C. Then having libm separate could prevent a lot of large (and slow if it was used) code from unnecessarily being linked in.
In addition, if there is only static linking available, then a similar argument applies that it would keep executable sizes and compile times down.
stdio is part of the standard C library which, by default, GCC will link against.
The math function implementations are in a separate libm file that is not linked to by default, so you have to specify it -lm. By the way, there is no relation between those header files and library files.
All libraries like stdio.h and stdlib.h have their implementation in libc.so or libc.a and get linked by the linker by default. The libraries for libc.so are automatically linked while compiling and is included in the executable file.
But math.h has its implementations in libm.so or libm.a which is separate from libc.so. It does not get linked by default and you have to manually link it while compiling your program in GCC by using the -lm flag.
The GNU GCC team designed it to be separate from the other header files, while the other header files get linked by default, but math.h file doesn't.
Here read the item number 14.3, you could read it all if you wish:
Reason why math.h is needs to be linked
Look at this article: Why do we have to link math.h in GCC?
Have a look at the usage:
Using the library
Note that -lm may not always need to be specified even if you use some C math functions.
For example, the following simple program:
#include <stdio.h>
#include <math.h>
int main() {
printf("output: %f\n", sqrt(2.0));
return 0;
}
can be compiled and run successfully with the following command:
gcc test.c -o test
It was tested on GCC 7.5.0 (on Ubuntu 16.04) and GCC 4.8.0 (on CentOS 7).
The post here gives some explanations:
The math functions you call are implemented by compiler built-in functions
See also:
Other Built-in Functions Provided by GCC
How to get the gcc compiler to not optimize a standard library function call like printf?

Resources