gcc - 2 versions, different treatment of inline functions

gcc - 2 versions, different treatment of inline functions - c

Recently I've come across a problem in my project. I normally compile it in gcc-4, but after trying to compile in gcc-3, I noticed a different treatment of inline functions. To illustrate this I've created a simple example:
main.c:
#include "header.h"
#include <stdio.h>
int main()
{
printf("f() %i\n", f());
return 0;
}
file.c:
#include "header.h"
int some_function()
{
return f();
}
header.h
inline int f()
{
return 2;
}
When I compile the code in gcc-3.4.6 with:
gcc main.c file.c -std=c99 -O2
I get linker error (multiple definition of f), the same if I remove the -O2 flag. I know the compiler does not have to inline anything if it doesn't want to, so I assumed it placed f in the object file instead of inlining it in case of both main.c and file.c, thus multiple definition error. Obviously I could fix this by making f static, then, in the worst case, having a few f's in the binary.
But I tried compiling this code in gcc-4.3.5 with:
gcc main.c file.c -std=c99 -O2
And everything worked fine, so I assumed the newer gcc inlined f in both cases and there was no function f in the binary at all (checked in gdb and I was right).
However, when I removed the -O2 flag, I got two undefined references to int f().
And here, I really don't understand what is happening. It seems like gcc assumed f would be inlined, so it didn't add it to the object file, but later (because there was no -O2) it decided to generate calls to these functions instead of inlining and that's where the linker error came from.
Now comes the question: how should I define and declare simple and small functions, which I want inline, so that they can be used throughout the project without the fear of problems in various compilers? And is making all of them static the right thing to do? Or maybe gcc-4 is broken and I should never have multiple definitions of inline functions in a few translation units unless they're static?

Yes, the behavior has been changed from gcc-4.3 onwards. The gcc inline doc (http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Inline.html) details this.
Short story: plain inline only serves to tell gcc (in the old version anyway) to
inline calls to the from the same file scope. However, it does not tell gcc that
all callers would be from the file scope, thus gcc also keeps a linkable version
of f() around: which explains your duplicate symbols error above.
Gcc 4.3 changed this behavior to be compatible with c99.
And, to answer your specific question:
Now comes the question: how should I define and declare simple and small functions, which I want inline, so that they can be used throughout the project without the fear of problems in various compilers? And is making all of them static the right thing to do? Or maybe gcc-4 is broken and I should never have multiple definitions of inline functions in a few translation units unless they're static?
If you want portability across gcc versions use static inline.

Related

Resolve undefined reference by stripping unused code

Assume we have the following C code:
void undefined_reference(void);
void bad(void) {
undefined_reference();
}
int main(void) {}
In function bad we fall into the linker error undefined reference to 'undefined_reference', as expected. This function is not actually used anywhere in the code, though, and as such, for the execution of the program, this undefined reference doesn't matter.
Is it possible to compile this code successfully, such that bad simply gets removed as it is never called (similar to tree-shaking in JavaScript)?

This function is not actually used anywhere in the code!
You know that, I know that, but the compiler doesn't. It deals with one translation unit at a time. It cannot divine out that there are no other translation units.
But main doesn't call anything, so there cannot be other translation units!
There can be code that runs before and after main (in an implementation-defined manner).
OK what about the linker? It sees the whole program!
Not really. Code can be loaded dynamically at run time (also by code that the linker cannot see).
So neither the compiler nor linker even try to find unused function by default.
On some systems it is possible to instruct the compiler and the linker to try and garbage-collect unused code (and assume a whole-program view when doing so), but this is not usually the default mode of operation.
With gcc and gnu ld, you can use these options:
gcc -ffunction-sections -Wl,--gc-sections main.c -o main
Other systems may have different ways of doing this.

Many compilers (for example gcc) will compile and link it correctly if you
Enable optimizations
make function bad static. Otherwise, it will have external linkage.
https://godbolt.org/z/KrvfrYYdn
Another way is to add the stump version of this function (and pragma displaying warning)

GCC/Clang How can I Force no Undefined Symbols?

I am making a program in C using GCC/Clang to compile, and I have the following issue: I am trying to make sure that the compiler doesn't let me leave any symbols in my program undefined. I understand that translation units should be able to compile with undefined symbols, but I don't want it to link any static libraries unless it can find all of its internal symbols. For example:
#include <stdio.h>
int add(int a, int b);
int main(void)
{
printf("Hello, World.\n");
}
I would expect that this program should compile, but it should not link (unless of course, I can find a way to state that add is a part of a shared library), which is not the case, it compiles with no warnings nor errors, even with -Werror -Wall --pedantic-errors. Is there any option in GCC/Clang that won't let a static library be compiled unless all of the internal symbols are defined? Otherwise, I think this is a disaster waiting to happen, especially in larger projects.
Thank you all in advance.

Using GDB without debug symbols

Assume the following code:
#include <iostream>
void test(){
//
}
int main(){
return 0;
}
Compiling whiteout -g I'm still able to set a breakpoint on main and test using GDB.
How is that possible? Is it related to symbol tables?
(gdb) b test
Breakpoint 1 at 0x400512

Here's what you're missing.
C++ is built around the concept of compiling and then linking. As such, during the compilation stage, the compiler assumes that the current file is just one file in a more complex program that will be eventually linked together.
When you write:
void test(){
//
}
The compiler has no choice but to assume that test is going to be called by code from another source file, and that will be compiled into a separate .o file. As such, it exports test's symbol despite the fact that no debug symbols are defined.
To see this effect in action, try the following. First, mark test as static. If you compile with optimization, you will see that test is no longer visible to gdb. In fact, it is no longer even defined. The compiler inlines it away.
Another way of making this happen is by passing g++ the -fwhole-program option. This option tells gcc to assume the current file being compile is the whole program, no other compilation unit will exist. This allows it, effectively, to treat all function and global definitions as static. Again, once you turn on optimizations, you will see that test is no longer visible to gdb.

Can printf get replaced by puts automatically in a C program?

#include <stdio.h>
int puts(const char* str)
{
return printf("Hiya!\n");
}
int main()
{
printf("Hello world.\n");
return 0;
}
This code outputs "Hiya!" when run. Could someone explain why?
The compile line is:
gcc main.c
EDIT: it's now pure C, and any extraneous stuff has been removed from the compile line.

Yes, a compiler may replace a call to printf by an equivalent call to puts.
Because you defined your own function puts with the same name as a standard library function, your program's behavior is undefined.
Reference: N1570 7.1.3:
All identifiers with external linkage in any of the following subclauses [this includes puts] are always reserved for use as identifiers with external linkage.
...
If the program declares or defines an identifier in a
context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved
identifier as a macro name, the behavior is undefined.
If you remove your own puts function and examine an assembly listing, you might find a call to puts in the generated code where you called printf in the source code. (I've seen gcc perform this particular optimization.)

It depends upon the compiler and the optimization level. Most recent versions of GCC, on some common systems, with some optimizations, are able to do such an optimization (replacing a simple printf with puts, which AFAIU is legal w.r.t. standards like C99)
You should enable warnings when compiling (e.g. try first to compile with gcc -Wall -g, then debug with gdb, then when you are confident with your code compile it with gcc -Wall -O2)
BTW, redefining puts is really really ugly, unless you do it on purpose (i.e. are coding your own C library, and then you have to obey to the standards). You are getting some undefined behavior (see also this answer about possible consequences of UB). Actually you should avoid redefining names mentioned in the standard, unless you really really know well what you are doing and what is happening inside the compiler.
Also, if you compiled with static linking like gcc -Wall -static -O main.c -o yourprog I'll bet that the linker would have complained (about multiple definition of puts).
But IMNSHO your code is plain wrong, and you know that.
Also, you could compile to get the assembler, e.g. with gcc -fverbose-asm -O -S; and you could even ask gcc to spill a lot of "dump" files, with gcc -fdump-tree-all -O which might help you understanding what gcc is doing.
Again, this particular optimization is valid and very useful : the printf routine of any libc has to "interpret" at runtime the print format string (handling %s etc ... specially); this is in practice quite slow. A good compiler is right in avoiding calling printf (and replacing with puts) when possible.
BTW gcc is not the only compiler doing that optimization. clang also does it.
Also, if you compile with
gcc -ffreestanding -O2 almo.c -o almo
the almo program shows Hello world.
If you want another fancy and surprizing optimization, try to compile
// file bas.c
#include <stdlib.h>
int f (int x, int y) {
int r;
int* p = malloc(2*sizeof(int));
p[0] = x;
p[1] = y;
r = p[0]+p[1];
free (p);
return r;
}
with gcc -O2 -fverbose-asm -S bas.c then look into bas.s; you won't see any call to malloc or to free (actually, no call machine instruction is emitted) and again, gcc is right to optimize (and so does clang)!
PS: Gnu/Linux/Debian/Sid/x86-64; gcc is version 4.9.1, clang is version 3.4.2

Try ltrace on your executable. You will see that printf gets replaced by puts call by the compiler. This depends on the way you called printf
An interesting reading on this is here

Presumably, your library's printf() calls puts ().
Your puts() is replacing the library version.

Linking libraries built with different preprocessor flags or C standards

Scenario 1:
I want to link an new library (libA) into my program, libA was built using gcc with -std=gnu99 flag, while the current libraries of my program were built without that option (and let's assume gcc uses -std=gnu89 by default).
Scenario 2:
libB was built with some preprocessor flags like "-D_XOPEN_SOURCE -D_XOPEN_SOURCE_EXTENDED" to enable XPG4 features, e.g. msg_control member of struct msghdr. While libC wasn't built without those preprocessor flags, then it's linked against libB.
Is it wrong to link libraries built with different preprocessor flags or C standards ?
My concern is mainly about structure definitions mismatch.
Thanks.

Scenario 1 is completely safe for you. std= option in GCC checks code for compatibility with standard, but has nothing to do with ABI, so you may feel free to combine precompiled code with different std options.
Scenario 2 may be unsafe. I will put here just one simple example, real cases may be much more tricky.
Consider, that you have some function, like:
#ifdef MYDEF
int foo(int x) { ... }
#else
int foo(float x) { ... }
#endif
And you compile a.o with -DMYDEF and b.o without, and function bar from a.o calls function foo in b.o. Next you link it together and everything seems to be fine. Then everything fails in runtime and you may have very hard time debugging why are you passing int from one module, while expecting float on callee side.
Some more tricky cases may include conditionally defined structure fields, calling conventions, global variable sizes.
P.S. Assuming all your sources are written in the same language, varying only std options and macro definitions. Combining C and C++ code is really tricky sometimes, agree with Mikhail.

The few times I have encountered structure definition mismatch was when combining C and C++ code, in these cases there was a clear warning that something terrible was happening.
Something like
/usr/lib/gcc/i586-suse-linux/4.3/../../../../i586-suse-linux/bin/ld: Warning: size of symbol `tree' changed from 324 in /tmp/ccvx8fpJ.o to 328 in gpu.o
See that question.