Suppose we have the following iterface to a library:
// my_interface.h
typedef float (*myfunc)(float);
myfunc get_the_func();
And suppose that the implementation is as follows:
// my_impl.c
#include <math.h>
myfunc get_the_func() {
return sinf;
}
And now suppose the client code does the following:
#include "my_interface.h"
...
myfunc func = get_the_func();
printf("%f\n", func(0.0f));
Does the standard guarantee that get_the_function() will return the address of the standard math library sinf()? If so, where in the standard is this implied?
Please note the sinf() is not explicitly called anywhere.
The standard does not require external functions to be called: being referenced is enough to be kept in the results of translation. According to the standard's section 5.1.1.2.1, during the eights (and last) phase of translation
All external object and function references are resolved. Library components are linked to satisfy external references to functions and objects not defined in the current translation.
Since returning a pointer to a function is considered a reference to it, the standard guarantees that sinf will be linked to whatever implementation that you supply to the linker, which may or may not be the one coming from the standard math library.
The C standard doesn't guarantee that get_the_function() will return the address of the sinf function in the math library. The C standard guarantees that whatever the pointer is returned will be callable as if you called the sinf function and it will compare equal to any other pointer to that function.
On some architectures you might get a pointer to a descriptor of the function which the compiler handles specially and I've also seen function pointer values pointing to a dynamic linking trampoline and not the function itself.
I recall a discussion on the gcc mailing lists around 10 years ago if a function pointer cast to (void *) actually needs to compare equal to another function pointer cast to (void *) which was in connection with how IA64 implements function pointers (they could point to different descriptor structs in different libraries, which meant that to correctly implement compares the compiler would have to compare the contents of the descriptors and not the pointers themselves). But I don't remember what the standards lawyers decided and/or if this was solved in the linker.
Okay so NOWHERE in the standard does it say that just because you #include <math.h> that you will also link to the standard math library ... As a matter of fact that DOES NOT happen.
You HAVE to link to the libm in order to use the standard math library, as in:
cc -o foo foo.c -lm
^^^^
The marked option is actually for the linker step, without it there is no linkage irrespective of whether you a function in a library as a return value or whether you use it to actually call the function.
The external symbols are resolved by either explicitly specifying the archives/objects/libraries or in case of systems / environments lazy linking during runtime through dynamic linking.
On environments that support dynamic linking, weak linking, lazy linking, etc. there is no guarantee that references will ever be resolved. For the resolution the execution path has to be traversed.
Let's say it is. Still, your client needs to provide a linkage path for resolving sinf as well either when they are linking or during runtime for environments that support it.
the point:
The client user has the ability to use any way to resolve the symbol to an address that they see fit; So, in fact, there is no way to guarantee that your client will link to the standard library and that it will resolve to the system library sinf. The only thing you know is that if that path is executed; it will either result in an address that can be looked up using the sinf link or it will crash unceremoniously for environments that don't have to resolve the symbols at link time.
update/clarification:
To clarify, if sinf is used as a variable; then it needs to be resolved but there is still NO guarantee that when the client code resolves the symbol when they go through their linking step that they will resolve it against the math library. If we are talking about GUARANTEES that is.
Now practically speaking, if the client links against the standard math library and doesn't pull of any of the overrides that they can (which I pointed out above) then yes that symbol will get resolved to require a linkage against the standard library (either static or dynamic)
My original answer is a bit ehem "prissy" for which I apologize because we were talking about standards and guarantees thus the rantish nature. There is nothing for example stopping the client simply doing something like this:
foo.c:
#include "my_interface.h"
...
myfunc func = get_the_func();
printf("%f\n", func(0.0f));
first pass compile:
cc -o foo foo.c
get an error that says sinf is unresolved and so the client edits her source file:
foo.c:
#include "my_interface.h"
...
void * sinef = NULL;
myfunc func = get_the_func();
printf("%f\n", func(0.0f));
and now you have a fully resolved but nicely crashing program;
Related
I'm new to C and have read that each function may only be defined once, but I can't seem to reconcile this with what I'm seeing in the console. For example, I am able to overwrite the definition of printf without an error or warning:
#include <stdio.h>
extern int printf(const char *__restrict__format, ...) {
putchar('a');
}
int main() {
printf("Hello, world!");
return 0;
}
So, I tried looking up the one-definition rule in the standard and found Section 6.9 (5) on page 155, which says (emphasis added):
An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier delared with external linkage is used in an expression [...], somewhere in the entire program there shall be exactly one external definition for that identifier; otherwise, there shall be no more than one.
My understanding of linkage is very shaky, so I'm not sure if this is the relevant clause or what exactly is meant by "entire program". But if I take "entire program" to mean all the stuff in <stdio.h> + my source file, then shouldn't I be prohibited from redefining printf in my source file since it has already been defined earlier in the "entire program" (i.e. in the stdio bit of the program)?
My apologies if this question is a dupe, I couldn't find any existing answers.
The C standard does not define what happens if there is more than one definition of a function.
… shouldn't I be prohibited…
The C standard has no jurisdiction over what you do. It specifies how C programs are interpreted, not how humans may behave. Although some of its rules are written using “shall,” this is not a command to the programmer about what they may or may not do. It is a rhetorical device for specifying the semantics of C programs. C 2018 4 2 tells us what it actually means:
If a “shall” or “shall not” requirement that appears outside of a constraint or runtime-constraint is violated, the behavior is undefined…
So, when you provide a definition of printf and the standard C library provides a definition of printf, the C standard does not specify what happens. In common practice, several things may happen:
The linker uses your printf. The printf in the library is not used.
The compiler has built-in knowledge of printf and uses that in spite of your definition of printf.
If your printf is in a separate source module, and that module is compiled and inserted into a library, then which printf the program uses depends on the order the libraries are specified to the linker.
While the C standard does not define what happens if there are multiple definitions of a function (or an external symbol in general), linkers commonly do. Ordinarily, when a linker processes a library file, its behavior is:
Examine each module in the library. If the module defines a symbol that is referenced by a previously incorporated object module but not yet defined, then include that module in the output the linker is building. If the module does not define any such symbol, do not use it.
Thus, for ordinary functions, the behavior of multiple definitions that appear in library files is defined by the linker, even though it is not defined by the C standard. (There can be complications, though. Suppose a program uses cos and sin, and the linker has already included a module that defines cos when it finds a library module that defines both sin and cos. Because the linker has an unresolved reference to sin, it includes this library module, which brings in a second definition of cos, causing a multiple-definition error.)
Although the linker behavior may be well defined, this still leaves the issue that compilers have built-in knowledge about the standard library functions. Consider this example. Here, I added a second printf, so the program has:
printf("Hello, world!");
printf("Hello, world!\n");
The program output is “aHello, world.\n”. This shows the program used your definition for the first printf call but used the standard behavior for the second printf call. The program behaves as if there are two different printf definitions in the same program.
Looking at the assembly language shows what happens. For the second call, the compiler decided that, since printf("Hello, world!\n"); is printing a string with no conversion specifications and ending with a new-line character, it can use the more-efficient puts routine instead. So the assembly language has call puts for the second printf. The compiler cannot do this for the first printf because it does not end with a new-line character, which puts automatically adds.
Please aware of declaration and definition. The term are totally different.
stdio.h only provide the declaration. And therefore, when you declare/define in your file, as long as the prototype is similar, it is fine with this.
You are free to define in your source file. And if it is available, the final program will link to the yours instead of the one in library.
I have a header-only library that's currently calling malloc and free
This header is included in a lot of different static libraries, which are used to build differently configured programs.
I would like to be able to replace those calls with calls into another allocator, at link time -- based on whether that allocator library is included in the link step, without affecting other calls to malloc and free.
My idea is to have the library call customizable_malloc and customizable_free and have those symbols resolve to malloc and free "by default" -- then the allocator library can provide alternate definitions for customizable_malloc and customizable_free
However, I messed around with weak/alias/weakref attributes and I can't seem to get anything to work. Is there a way to do this?
Note: I know I can create an extra layer of indirection: customizable_malloc could be a weak alias to a function that calls malloc. But that adds a level of indirection that seems unnecessary.
Ideally, here's the steps I want the linker to take when it comes across a call to customizable_malloc:
Check if a definition for customizable_malloc exists
If it does, call it
If it does not, behave as if the call was to regular malloc.
Clarifying note: In a single-target scenario, this could be done with #define. The library could create macros customizable_malloc and customizable_free that default to malloc and free. However, this doesn't work in this case since things are being built into static libraries without knowledge of whether there's an override.
The extra level of indirection is the only way to do it. ELF (and other real-world binary format) symbol definition syntax (including for weak symbols) does not provide any way to provide a definition in terms of a reference to an external definition from somewhere else.
Just do the wrapper approach you're considering. It's simple, clean, and relative to the cost of malloc/free it's not going to make any big difference in performance.
You can achieve desired outcome using GNU-ld --defsym option.
Example:
#include <malloc.h>
#include <stdio.h>
void *custom_malloc(size_t sz);
int main()
{
void *p = custom_malloc(1);
void *q = malloc(42); // important: malloc needs to be referenced somewhere
printf("p = %p, q = %p\n", p, q);
return 0;
}
Compiling this with gcc -c t.c will (naturally) fail to link with unresolved reference to custom_malloc (if the library providing custom_malloc is not used):
$ gcc t.o
/usr/bin/ld: t.o: in function `main':
t.c:(.text+0xe): undefined reference to `custom_malloc'
collect2: error: ld returned 1 exit status
Adding --defsym=custom_malloc=malloc solves this:
$ gcc t.o -Wl,--defsym=custom_malloc=malloc && ./a.out
p = 0x558ca4dc22a0, q = 0x558ca4dc22c0
P.S. If malloc is not linked into the program (i.e. if I comment out the // important line), then --defsym fails:
$ gcc t.c -Wl,--defsym=custom_malloc=malloc && ./a.ou
/usr/bin/ld:--defsym:1: unresolvable symbol `malloc' referenced in expression
...
But that is (I believe) not very relevant to your scenario.
P.P.S. As R correctly stated, the "extra level of indirection" could be a single unconditional JMP malloc instruction, and the overhead of such indirection is unlikely to be measurable.
This question already has answers here:
Must declare function prototype in C? [duplicate]
(10 answers)
Closed 6 years ago.
I faced a similar problem in my (big) project.
//# include <string.h> // not included
void foo(char * str, const char * delim)
{
char * tok = strtok(str, delim);
// warning ^ "assignement makes pointer from integer without a cast"
// [...]
}
The answer (just add #include <string.h> to have the prototype of strtok) solved the issue indeed.
However, due to my poor compiler/linker knowledge, I fail to understand how the process accepts a function that have not been prototyped.
I'd rather expected the error undefined reference to function 'strtok' which is typical when you forget to include the right header.
[EDIT] I understand why this question has been marked as duplicate but I do think it is different : I am aware of "good practice" regarding includes, I am just wondering about compiler's behavior. However I admit that I can found (part of) an answer to my question in this post: Are prototypes required for all functions in C89, C90 or C99? or this one: Must declare function prototype in C?
While linking your binary, unless explicitly mentioned, your binary is linked with the default C standard library (glibc, for example) anyways, where the function is defined.
So, when you miss the header file containing the declaration, you end up with the warning (in case the function prototype has a mismatchnote) but during linking time, due to the presence of the default C library, the program is successfully linked anyway.
FWIW, according to C11, support for implicit function declaration has been dropped but most compilers support the bad behavior to keep the backward compatibility for the legacy code.
NOTE:
Implicit function declaration: Earlier, (before C99, AFAIK), the standard did not mandate for a function to have the forward declaration. In case, a function would have been used without a forward declaration, it was assumed to return int and accept any number of incoming parameters.
Because gcc automatically links your code with the C library. The "undefined reference to function" error is typically issued by the linker when it couldn't resolve a symbol and that only occurs if the symbol couldn't be found in any of the libraries linked (the order of linking might also matter). But the C library is linked by default -- it's as if you linked it with -lc. So, you don't get that error.
If you tell gcc to not link the C library using -nostdlib then you'll see the error you expected:
$ gcc -nostdlib file.c
On the other hand, you should always provide prototypes for functions.
You may be interested in other similar linker options such as -nodefaultlibs, nostartfiles etc which you can find in gcc's manual.
I saw a snippet of code on CodeGolf that's intended as a compiler bomb, where main is declared as a huge array. I tried the following (non-bomb) version:
int main[1] = { 0 };
It seems to compile fine under Clang and with only a warning under GCC:
warning: 'main' is usually a function [-Wmain]
The resulting binary is, of course, garbage.
But why does it compile at all? Is it even allowed by the C specification? The section that I think is relevant says:
5.1.2.2.1 Program startup
The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters [...] or with two parameters [...] or in some other implementation-defined manner.
Does "some other implementation-defined manner" include a global array? (It seems to me that the spec still refers to a function.)
If not, is it a compiler extension? Or a feature of the toolchains, that serves some other purpose and they decided to make it available through the frontend?
It's because C allows for "non-hosted" or freestanding environment which doesn't require the main function. This means that the name main is freed for other uses. This is why the language as such allows for such declarations. Most compilers are designed to support both (the difference is mostly how linking is done) and therefore they don't disallow constructs that would be illegal in hosted environment.
The section you refers to in the standard refers to hosted environment, the corresponding for freestanding is:
in a freestanding environment (in which C program execution may take place without any
benefit of an operating system), the name and type of the function called at program
startup are implementation-defined. Any library facilities available to a freestanding
program, other than the minimal set required by clause 4, are implementation-defined.
If you then link it as usual it will go bad since the linker normally has little knowledge about the nature of the symbols (what type it has or even if it's a function or variable). In this case the linker will happily resolve calls to main to the variable named main. If the symbol is not found it will result in link error.
If you're linking it as usual you're basically trying to use the compiler in hosted operation and then not defining main as you're supposed to means undefined behavior as per appendix J.2:
the behavior is undefined in the following circumstances:
...
program in a hosted environment does not define a function named
main
using one
of the specified forms (5.1.2.2.1)
The purpose of the freestanding possibility is to be able to use C in environments where (for example) standard libraries or CRT initialization is not given. This means that the code that is run before main is called (that's the CRT initialization that initializes the C runtime) might not provided and you would be expected to provide that yourself (and you may decide to have a main or may decide not to).
If you are interested how to create program in main array: https://jroweboy.github.io/c/asm/2015/01/26/when-is-main-not-a-function.html. The example source there just contains a char (and later int) array called main which is filled with machine instructions.
The main steps and problems were:
Obtain the machine instructions of a main function from a gdb memory dump and copy it into the array
Tag the data in main[] executable by declaring it const (data is apparently either writable or executable)
Last detail: Change an address for actual string data.
The resulting C code is just
const int main[] = {
-443987883, 440, 113408, -1922629632,
4149, 899584, 84869120, 15544,
266023168, 1818576901, 1461743468, 1684828783,
-1017312735
};
but results in an executable program on a 64 bit PC:
$ gcc -Wall final_array.c -o sixth
final_array.c:1:11: warning: ‘main’ is usually a function [-Wmain]
const int main[] = {
^
$ ./sixth
Hello World!
The problem is that main is not a reserved identifier. The C standard only says that in hosted systems there is usually a function called main. But nothing in the standard prevents you from abusing the same identifier for other sinister purposes.
GCC gives you a smug warning "main is usually a function", hinting that the use of the identifier main for other unrelated purposes isn't a brilliant idea.
Silly example:
#include <stdio.h>
int main (void)
{
int main = 5;
main:
printf("%d\n", main);
main--;
if(main)
{
goto main;
}
else
{
int main (void);
main();
}
}
This program will repeatedly print the numbers 5,4,3,2,1 until it gets a stack overflow and crashes (don't try this at home). Unfortunately, the above program is a strictly conforming C program and the compiler can't stop you from writing it.
main is - after compiling - just another symbol in an object file like many others (global functions, global variables, etc).
The linker will link the symbol main regardless of its type. Indeed, the linker cannot see the type of the symbol at all (he can see, that it isn't in the .text-section however, but he doesn't care ;))
Using gcc, the standard entry point is _start, which in turn calls main() after preparing the runtime environment. So it will jump to the address of the integer array, which usually will result in a bad instruction, segfault or some other bad behaviour.
This all of course has nothing to do with the C-standard.
It only compiles because you don't use the proper options (and works because linkers sometimes only care for the names of symbols, not their type).
$ gcc -std=c89 -pedantic -Wall x.c
x.c:1:5: warning: ISO C forbids zero-size array ‘main’ [-Wpedantic]
int main[0];
^
x.c:1:5: warning: ‘main’ is usually a function [-Wmain]
const int main[1] = { 0xc3c3c3c3 };
This compiles and executes on x86_64... does nothing just return :D
Wy redefinition of function already present in dynamic library does not throws any compilation and linking error?
In the below function
#include "calc_mean.h"
#include <stdio.h>
int mean(int t, int v) {
return 0;
}
int main () {
int theMean = mean(3,6);
printf("\n %d\n",theMean);
}
Inside the shared library Definition of mean function already present as below.
#include <stdio.h>
#include "calc_mean.h"
int mean(int a, int b) {
return (a+b)/2;
}
The definition of mean function is already present in the shared library libmean.so. But during compilation I don't see any redefinition error and compilation is successful.
And on successful execution the o/p I see is 0 instead of 4 so the function definition of mean inside the shared library is not getting executed but the one inside the main module is getting executed.
Why is this happening so?
The linker only links in a function from a library if the function had not yet been found during the compilation/linking process.
The reason for the difference in functionality is that there are different types of symbols. A library function is a weak symbol. It is only included if it is not already defined. nm is a tool for listing the symbols in an object or executable. In its man-page you can find a list of the types of symbols.
There is also a wikipedia page on weak symbols.
Having two definitions of one externally-visible function (even if the definitions are identical, for non-inline functions) causes undefined behaviour, with no diagnostic required. (Ref: C99 6.9#5 and Annex J.2)
In C, some illegal code requires a compiler diagnostic and some doesn't. Typically the ones that do not require a diagnostic are because:
it would be considered too prohibitive to require all compilers to detect and report the error
there were existing systems in use that did not diagnose it and the Standard committee did not want to render an existing implementation non-conforming.
In this case, my guess would be that this is a case of the first one; they wanted to leave open the option for compilers/linkers to implement weak symbols as an extension, so they did not specify that the compiler must give a warning here. Or possibly it is actually difficult to detect this in general, I've never tried to write a linker!
It should be considered a quality-of-implementation issue if no diagnostic is given. Perhaps it is possible to pass different flags to your linker so that it does reject this code; if not then you could put a in bug report or a feature request.
Did you link correctly the shared library because the compiler should give the error :
multiple definition of 'mean'