GCC/Clang How can I Force no Undefined Symbols? - c

I am making a program in C using GCC/Clang to compile, and I have the following issue: I am trying to make sure that the compiler doesn't let me leave any symbols in my program undefined. I understand that translation units should be able to compile with undefined symbols, but I don't want it to link any static libraries unless it can find all of its internal symbols. For example:
#include <stdio.h>
int add(int a, int b);
int main(void)
{
printf("Hello, World.\n");
}
I would expect that this program should compile, but it should not link (unless of course, I can find a way to state that add is a part of a shared library), which is not the case, it compiles with no warnings nor errors, even with -Werror -Wall --pedantic-errors. Is there any option in GCC/Clang that won't let a static library be compiled unless all of the internal symbols are defined? Otherwise, I think this is a disaster waiting to happen, especially in larger projects.
Thank you all in advance.

Related

How does gcc resolve malloc?

Since I started to learn c and c++, it has always confused me about some special functions such as "printf", "malloc","free","fork","new" and the standard iostream "cout" and "cin",whenever I try to go to definition of these members, it ends up with nothing or a simple declaration, I could never find the definition of them.Recently I'm learning compliers and the process of linking and it still seems unclear for these things.
suppose a simple a.c
#include<stdlib.h>
int main()
{
int* a = malloc(sizeof(int));
}
the function "malloc" was declared in "stdlib.h" but with has not defined.
now I use gcc to compile it:
gcc -Wall a.c -o out;
However, this command execute successfully,
so I am wondering since there is no link options, how does it resolve the "malloc" function?
And I know it was implemented through OS-specified system call, but how does the call-tree looks like? I don't care much about the algorithm it uses, because it's too complicated for me, but I want to have a general idea of how these c standard library function works, Thanks!

Can printf get replaced by puts automatically in a C program?

#include <stdio.h>
int puts(const char* str)
{
return printf("Hiya!\n");
}
int main()
{
printf("Hello world.\n");
return 0;
}
This code outputs "Hiya!" when run. Could someone explain why?
The compile line is:
gcc main.c
EDIT: it's now pure C, and any extraneous stuff has been removed from the compile line.
Yes, a compiler may replace a call to printf by an equivalent call to puts.
Because you defined your own function puts with the same name as a standard library function, your program's behavior is undefined.
Reference: N1570 7.1.3:
All identifiers with external linkage in any of the following subclauses [this includes puts] are always reserved for use as identifiers with external linkage.
...
If the program declares or defines an identifier in a
context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved
identifier as a macro name, the behavior is undefined.
If you remove your own puts function and examine an assembly listing, you might find a call to puts in the generated code where you called printf in the source code. (I've seen gcc perform this particular optimization.)
It depends upon the compiler and the optimization level. Most recent versions of GCC, on some common systems, with some optimizations, are able to do such an optimization (replacing a simple printf with puts, which AFAIU is legal w.r.t. standards like C99)
You should enable warnings when compiling (e.g. try first to compile with gcc -Wall -g, then debug with gdb, then when you are confident with your code compile it with gcc -Wall -O2)
BTW, redefining puts is really really ugly, unless you do it on purpose (i.e. are coding your own C library, and then you have to obey to the standards). You are getting some undefined behavior (see also this answer about possible consequences of UB). Actually you should avoid redefining names mentioned in the standard, unless you really really know well what you are doing and what is happening inside the compiler.
Also, if you compiled with static linking like gcc -Wall -static -O main.c -o yourprog I'll bet that the linker would have complained (about multiple definition of puts).
But IMNSHO your code is plain wrong, and you know that.
Also, you could compile to get the assembler, e.g. with gcc -fverbose-asm -O -S; and you could even ask gcc to spill a lot of "dump" files, with gcc -fdump-tree-all -O which might help you understanding what gcc is doing.
Again, this particular optimization is valid and very useful : the printf routine of any libc has to "interpret" at runtime the print format string (handling %s etc ... specially); this is in practice quite slow. A good compiler is right in avoiding calling printf (and replacing with puts) when possible.
BTW gcc is not the only compiler doing that optimization. clang also does it.
Also, if you compile with
gcc -ffreestanding -O2 almo.c -o almo
the almo program shows Hello world.
If you want another fancy and surprizing optimization, try to compile
// file bas.c
#include <stdlib.h>
int f (int x, int y) {
int r;
int* p = malloc(2*sizeof(int));
p[0] = x;
p[1] = y;
r = p[0]+p[1];
free (p);
return r;
}
with gcc -O2 -fverbose-asm -S bas.c then look into bas.s; you won't see any call to malloc or to free (actually, no call machine instruction is emitted) and again, gcc is right to optimize (and so does clang)!
PS: Gnu/Linux/Debian/Sid/x86-64; gcc is version 4.9.1, clang is version 3.4.2
Try ltrace on your executable. You will see that printf gets replaced by puts call by the compiler. This depends on the way you called printf
An interesting reading on this is here
Presumably, your library's printf() calls puts ().
Your puts() is replacing the library version.

Given .dll, .lib, and .h, and would like to write a C program using a function in the DLL

So, I have TVZLib.h, TVZlib.dll, and TVZlib.lib, and I am using gcc to compile the following program (it's a simple test case). The complier gives me the error:
"undefined reference to '_imp__TVZGetNavigationMatrix'"
Yet. when I comple the program with a different type of parameter for the function's call, it complains that it's not the correct parameter (requires *float). To me, that means that it at least has found the function, as it knows what it wants.
From my research, I can tell that people think it's to do with the linking of the library, or the order in which I link, but I've tried all of the gcc commands in all combinations, and all give me the same error, so I'm desperate for some help.
#include <stdlib.h>
#include <stdio.h>
#include "TVZLib.h"
int main() {
float floatie = 2;
float *ptr = &floatie;
TVZGetNavigationMatrix(ptr);
getchar();
return 0;
}
Thanks a lot in advance!
My compiler command:
gcc dlltest.c -L. TVZLib.lib
The header file (TVZLib.h).
And the direct output:
C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\ccuDpoiE.o:dlltest.c:(.text+0x2c): undefined reference to `_imp__TVZGetNavigationMatrix'
collect2: ld returned 1 exit status
It's been a while since I've been compiling natively on Windows...
Did you intend to link statically against TVZlib.lib? That's not happening.
By default, gcc will pick the dynamic version of a library if it finds both a static and a dynamic lib. If you want to force gcc to link statically, you can use the -static option.
If memory serves me right the _imp__ prefix is a sign that a DLL was loaded (_imp__ symbol prefix is used for the trampoline function that calls into the DLL).

Compile C program using dlopen and dlsym with -fPIC

I am having a problem about a wrong symbol resolution. My main program loads a shared library with dlopen and a symbol from it with dlsym. Both the program and the library are written in C.
Library code
int a(int b)
{
return b+1;
}
int c(int d)
{
return a(d)+1;
}
In order to make it work on a 64-bit machine, -fPIC is passed to gcc when compiling.
The program is:
#include <dlfcn.h>
#include <stdio.h>
int (*a)(int b);
int (*c)(int d);
int main()
{
void* lib=dlopen("./libtest.so",RTLD_LAZY);
a=dlsym(lib,"a");
c=dlsym(lib,"c");
int d = c(6);
int b = a(5);
printf("b is %d d is %d\n",b,d);
return 0;
}
Everything runs fine if the program is NOT compiled with -fPIC, but it crashes with a segmentation fault when the program is compiled with -fPIC. Investigation led to discover that the crash is due to the wrong resolution of symbol a. The crash occurs when a is called, no matter whether from the library or the main program (the latter is obtained by commenting out the line calling c() in the main program).
No problems occur when calling c() itself, probably because c() is not called internally by the library itself, while a() is both a function used internally by the library and an API function of the library.
A simple workaround is not use -fPIC when compiling the program. But this is not always possible, for example when the code of the main program has to be in a shared library itself. Another workaround is to rename the pointer to function a to something else. But I cannot find any real solution.
Replacing RTLD_LAZY with RTLD_NOW does not help.
I suspect that there is a clash between two global symbols. One solution is to declare a in the main program as static. Alternatively, the linux manpage mentions RTLD_DEEPBIND flag, a linux-only extension, which you can pass to dlopen and which will cause library to prefer its own symbols over global symbols.
It seems this issue can take place in one more case (like for me). I have a program and a couple of a dynamically linked libs. And when I tried to add one more I used a function from a static lib (my too) in it. And I forgot to add to linkage list this static lib. Linker was not warn me about this, but program was crushing with segmentation fault error.
Maybe this will help for someone.
FWIW, I ran into a similar problem when compiling as C++ and forgetting about name mangling. A solution there is to use extern "C".

gcc - 2 versions, different treatment of inline functions

Recently I've come across a problem in my project. I normally compile it in gcc-4, but after trying to compile in gcc-3, I noticed a different treatment of inline functions. To illustrate this I've created a simple example:
main.c:
#include "header.h"
#include <stdio.h>
int main()
{
printf("f() %i\n", f());
return 0;
}
file.c:
#include "header.h"
int some_function()
{
return f();
}
header.h
inline int f()
{
return 2;
}
When I compile the code in gcc-3.4.6 with:
gcc main.c file.c -std=c99 -O2
I get linker error (multiple definition of f), the same if I remove the -O2 flag. I know the compiler does not have to inline anything if it doesn't want to, so I assumed it placed f in the object file instead of inlining it in case of both main.c and file.c, thus multiple definition error. Obviously I could fix this by making f static, then, in the worst case, having a few f's in the binary.
But I tried compiling this code in gcc-4.3.5 with:
gcc main.c file.c -std=c99 -O2
And everything worked fine, so I assumed the newer gcc inlined f in both cases and there was no function f in the binary at all (checked in gdb and I was right).
However, when I removed the -O2 flag, I got two undefined references to int f().
And here, I really don't understand what is happening. It seems like gcc assumed f would be inlined, so it didn't add it to the object file, but later (because there was no -O2) it decided to generate calls to these functions instead of inlining and that's where the linker error came from.
Now comes the question: how should I define and declare simple and small functions, which I want inline, so that they can be used throughout the project without the fear of problems in various compilers? And is making all of them static the right thing to do? Or maybe gcc-4 is broken and I should never have multiple definitions of inline functions in a few translation units unless they're static?
Yes, the behavior has been changed from gcc-4.3 onwards. The gcc inline doc (http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Inline.html) details this.
Short story: plain inline only serves to tell gcc (in the old version anyway) to
inline calls to the from the same file scope. However, it does not tell gcc that
all callers would be from the file scope, thus gcc also keeps a linkable version
of f() around: which explains your duplicate symbols error above.
Gcc 4.3 changed this behavior to be compatible with c99.
And, to answer your specific question:
Now comes the question: how should I define and declare simple and small functions, which I want inline, so that they can be used throughout the project without the fear of problems in various compilers? And is making all of them static the right thing to do? Or maybe gcc-4 is broken and I should never have multiple definitions of inline functions in a few translation units unless they're static?
If you want portability across gcc versions use static inline.

Resources