Recently I've learnt about implicit function declarations in C. The main idea is clear but I have some troubles with understanding of the linkage process in this case.
Consider the following code ( file a.c):
#include <stdio.h>
int main() {
double someValue = f();
printf("%f\n", someValue);
return 0;
}
If I try to compile it:
gcc -c a.c -std=c99
I see a warning about implicit declaration of function f().
If I try to compile and link:
gcc a.c -std=c99
I have an undefined reference error. So everything is fine.
Then I add another file (file b.c):
double f(double x) {
return x;
}
And invoke the next command:
gcc a.c b.c -std=c99
Surprisingly everything is linked successfully. Of course after ./a.out invocation I see a rubbish output.
So, my question is: How are programs with implicitly declared functions linked? And what happens in my example under the hood of compiler/linker?
I read a number of topics on SO like this, this and this one but still have problems.
First of all, since C99 , implicit declaration of a function is removed from the standard. compilers may support this for compilation of legacy code, but it's nothing mandatory. Quoting the standard foreword,
remove implicit function declaration
That said, as per C11, chapter §6.5.2.2
If the function is defined with a type that does not include a prototype, and the types of
the arguments after promotion are not compatible with those of the parameters after
promotion, the behavior is undefined.
So, in your case,
the function call itself is implicit declaration (which became non-standard since C99),
and due to the mismatch of the function signature [Implicit declaration of a function were assumed to have an int return type], your code invokes undefined behavior.
Just to add a bit more reference, if you try to define the function in the same compilation unit after the call, you'll get a compilation error due to the mismatch signature.
However, your function being defined in a separate compilation unit (and missing prototype declaration), compiler has no way to check the signatures. After the compilation, the linker takes the object files and due to the absence of any type-checking in the linker (and no info in object files either), happily links them. Finally, it will end up in a successful compilation and linking and UB.
Here is what is happening.
Without a declaration for f(), the compiler assumes an implicit declaration like int f(void). And then happily compiles a.c.
When compiling b.c, the compiler does not have any prior declaration for f(), so it intuits it from the definition of f(). Normally you would put some declaration of f() in a header file, and include it in both a.c and b.c. Because both the files will see the same declaration, the compiler can enforce conformance. It will complain about the entity that does not match the declaration. But in this case, there is no common prototype to refer to.
In C, the compiler does not store any information about the prototype in the object files, and the linker does not perform any checks for conformance (it can't). All it sees is a unresolved symbol f in a.c and a symbol f defined in b.c. It happily resolves the symbols, and completes the link.
Things break down at run time though, because the compiler sets up the call in a.c based on the prototype it assumed there. Which does not match what the definition in b.c looks for. f() (from b.c) will get a junk argument off the stack, and return it as double, which will be interpreted as int on return in a.c.
How are programmes with implicitly declared functions are linked? And what happens in my example under the hood of compiler/linker?
The implicit int rule has been outlawed by the C standard since C99. So it's not valid to have programs with implicit function declarations.
It's not valid since C99. Before that, if a visible prototype is not available then the compiler implicitly declares one with int return type.
Surprisingly everything is linked successfully. Of course after
./a.out invocation I see a rubbish output.
Because you didn't have prototype, compiler implicitly declares one with int type for f(). But the actual definition of f() returns a double. The two types are incompatible and this is undefined behaviour.
This is undefined even in C89/C90 in which the implicit int rule is valid because the implicit prototype is not compatible with the actual type f() returns. So this example is (with a.c and b.c) is undefined in all C standards.
It's not useful or valid anymore to have implicit function declarations. So the actual detail of how compiler/linker handles is only of historic interest. It goes back to the pre-standard times of K&R C which didn't have function prototypes and the functions return int by default. Function prototypes were added to C in C89/C90 standard. Bottom line, you must have prototypes (or define functions before use) for all functions in valid C programs.
After compiling, all type information is lost (except maybe in debug info, but the linker doesn't pay attention to that). The only thing that remains is "there is a symbol called "f" at address 0xdeadbeef".
The point of headers is to tell C about the type of the symbol, including, for functions, what arguments it takes and what it returns. If you mismatch the real ones with the ones you declare (either explicitly or implicitly), you get undefined behavior.
Related
I just resolved an absolute headbanger of a problem, and the issue was so simple, yet so elusive. So frustratingly hidden behind a lack of compiler feedback and an excess of compiler complacency (which is rare!). During writing this post, I found a few similar questions, but none that quite match my scenario.
Calling method without typeless argument produces a compiler error when the definition includes strongly typed args.
Why does gcc allow arguments to be passed to a function defined to be with no arguments? and C function with incomplete declaration both pass excess arguments to an argumentless function.
Why does an empty declaration work for definitions with int arguments but not for float arguments? does contain a successfully building declaration/definition mismatch, but has no invocation, where I would expect to see a too few arguments message.
I have a function declaration with no args, a call to that function with no args, and the function definition below with args. Somehow, C manages to successfully call the function, no warning, no error, but very undefined behaviour. Where does the function get the missing argument from? Why don't I get a linker error since the no-arg function isn't defined? Why don't I get a compiler error because I'm redefining a function with a different signature? Why, oh why, is this allowed?
Compiling as C++ code (gcc -x c++, enabling Compile To Binary on Godbolt) I get a linker error as expected, because of course C++ allows overloading, and the no-arg overload isn't defined. By checking with Godbolt, compiling with Clang and MSVC as C code also both build successfully, with only MSVC spitting out a minor warning.
Here is my reduced example for Godbolt.
// Compile with GCC or Clang -x c -Wall -Wextra
// Compile with MSVC /Wall /W4 /Tc
#include <stdio.h>
#include <stdlib.h>
// This is just so Godbolt can do an MSVC build
#ifndef _MSC_VER
# include <unistd.h>
#else
# define read(file, output, count) (InputBuffer[count] = count, fd)
#endif
static char InputBuffer[16];
int ReadInput(); // <-- declared with no args
int main(void)
{
int count;
count = ReadInput(); // <-- called with no args
printf("%c", InputBuffer[0]); // just so the results, and hence the entire function call,
printf("%d", count); // don't get optimised away by not being used (even though I'm
return 0; // not using any optimisation... just being cautious)
};
int ReadInput(int fd) // <-- defined with args!
{
return read(fd, InputBuffer, 1); // arg is definitely used, it's not like it's optimised away!
};
Where does the function get the missing argument from?
Typically, the called function is compiled to get its parameters from the places the arguments would be passed according to the ABI (Application Binary Interface) being used. This is necessarily true when the called function is in a separate translation unit (and there is no link-time optimization), so the compiler cannot adjust it according to the calling code. If the call and the called function are in the same translation unit, the compiler could do other things.
For example, if the ABI says the first int class parameter is passed in processor register r4, then the called function will get its parameter from register r4. Since the caller has not put an argument there, the called function gets whatever value happens to be in r4 from previous use.
Why don't I get a linker error since the no-arg function isn't defined?
C implementations generally resolve identifiers by name only. Type information is not part of the name or part of resolution. A function declared as int ReadInput() has the same name as a function declared as int ReadInput(int fd), and, as far as the linker is concerned, a definition of one will satisfy a reference to the other.
Why don't I get a compiler error because I'm redefining a function with a different signature?
The definitions are compatible. In C, the declaration int ReadInput() does not mean the function has no parameters. It means “There is a function named ReadInput that returns int, and I am not telling you what its parameters are.
The declaration int ReadInput(int fd) means “There is a function named ReadInput that returns int, and it takes one parameter, an int. These declarations are compatible; neither says anything inconsistent with the other.
Why, oh why, is this allowed?
History. Originally, C did not supply parameter information in function declarations, just in definitions. The prototype-less declarations are still allowed so that old software continues to work.
Other answers explained why it is legal to call a function that was declared without a prototype (but that it is your responsibility to get the arguments right). But you might be interested in the -Wstrict-prototypes warning option accepted by both GCC and clang, which is documented to "Warn if a function is declared or defined without specifying the argument types." Your code then yields warning: function declaration isn't a prototype.
Try it on godbolt.
(I'm kind of surprised this warning isn't enabled with -Wall -Wextra.)
In C, unlike in C++, declaring a function with no arguments means that the function may have as many arguments as you'd like. If you want to make it really not have any arguments, you just have to explicitly declare that:
int ReadInput(void);
First of all, I know this way of programming is not good practice. For an explanation of why I'm doing this, read on after the actual question.
When declaring a function in C like this:
int f(n, r) {…}
The types of r and n will default to int. The compiler will likely generate a warning about it, but let's choose to ignore that.
Now suppose we call f but, accidentally or otherwise, leave out an argument:
f(25);
This will still compile just fine (tested with both gcc and clang). However there is no warning from gcc about the missing argument.
So my question is:
Why does this not produce a warning (in gcc) or error?
What exactly happens when it is executed? I assume I'm invoking undefined behaviour but I'd still appreciate an explanation.
Note that it does not work the same way when I declare int f(int n, int r) {…}, neither gcc nor clang will compile this.
Now if you're wondering why I would do such a thing, I was playing Code Golf and tried to shorten my code which used a recursive function f(n, r). I needed a way to call f(n, 0) implicitly, so I defined F(n) { return f(n, 0) } which was a little too many bytes for my taste. So I wondered whether I could just omit this parameter. I can't, it still compiles but no longer works.
While optimizing this code, it was pointed out to me that I could just leave out a return at the end of my function – no warning from gcc about this either. Is gcc just too tolerant?
You don't get any diagnostics from the compiler because you are not using modern "prototyped" function declarations. If you had written
int f(int n, int r) {…}
then a subsequent f(25) would have triggered a diagnostic. With the compiler on the computer I'm typing this on, it's actually a hard error.
"Old-style" function declarations and definitions intentionally cause the compiler to relax many of its rules, because the old-style code that they exist for backward compatibility with would do things like this all the dang time. Not the thing you were trying to do, hoping that f(25) would somehow be interpreted as f(25, 0), but, for instance, f(25) where the body of f never looks at the r argument when its n argument is 25.
The pedants commenting on your question are pedantically correct when they say that literally anything could happen (within the physical capabilities of the computer, anyway; "demons will fly out of your nose" is the canonical joke, but it is, in fact, a joke). However, it is possible to describe two general classes of things that are what usually happens.
With older compilers, what usually happens is, code is generated for f(25) just as it would have been if f only took one argument. That means the memory or register location where f will look for its second argument is uninitialized, and contains some garbage value.
With newer compilers, on the other hand, the compiler is liable to observe that any control-flow path passing through f(25) has undefined behavior, and based on that observation, assume that all such control-flow paths are never taken, and delete them. Yes, even if it's the only control-flow path in the program. I have actually witnessed Clang spit out main: ret for a program all of whose control-flow paths had undefined behavior!
GCC not complaining about f(n, r) { /* no return statement */ } is another case like (1), where the old-style function definition relaxes a rule. void was invented in the 1989 C standard; prior to that, there was no way to say explicitly that a function does not return a value. So you don't get a diagnostic because the compiler has no way of knowing that you didn't mean to do that.
Independently of that, yes, GCC's default behavior is awfully permissive by modern standards. That's because GCC itself is older than the 1989 C standard and nobody has reexamined its default behavior in a long time. For new programs, you should always use -Wall, and I recommend also at least trying -Wextra, -Wpedantic, -Wstrict-prototypes, and -Wwrite-strings. In fact, I recommend going through the "Warning Options" section of the manual and experimenting with all of the additional warning options. (Note however that you should not use -std=c11, because that has a nasty tendency to break the system headers. Use -std=gnu11 instead.)
First off, the C standard doesn't distinguish between warnings and errors. It only talks about "diagnostics". In particular, a compiler can always produce an executable (even if the source code is completely broken) without violating the standard.1
The types of r and n will default to int.
Not anymore. Implicit int has been gone from C since 1999. (And your test code requires C99 because for (int i = 0; ... isn't valid in C90).
In your test code gcc does issue a diagnostic for this:
.code.tio.c: In function ‘f’:
.code.tio.c:2:5: warning: type of ‘n’ defaults to ‘int’ [-Wimplicit-int]
It's not valid code, but gcc still produces an executable (unless you enable -Werror).
If you add the required types (int f(int n, int r)), it uncovers the next issue:
.code.tio.c: In function ‘main’:
.code.tio.c:5:3: error: too few arguments to function ‘f’
Here gcc somewhat arbitrarily decided not to produce an executable.
Relevant quotes from C99 (and probably C11 too; this text hasn't changed in the n1570 draft):
6.9.1 Function definitions
Constraints
[...]
If the declarator includes an identifier list, each declaration in the declaration list shall
have at least one declarator, those declarators shall declare only identifiers from the
identifier list, and every identifier in the identifier list shall be declared.
Your code violates a constraint (your function declarator includes an identifier list, but there is no declaration list), which requires a diagnostic (such as the warning from gcc).
Semantics
[...] If the
declarator includes an identifier list, the types of the parameters shall be declared in a
following declaration list.
Your code violates this shall rule, so it has undefined behavior. This applies even if the function is never called!
6.5.2.2 Function calls
Constraints
[...]
If the expression that denotes the called function has a type that includes a prototype, the
number of arguments shall agree with the number of parameters. [...]
Semantics
[...]
[...] If the number of arguments does not equal the number of parameters, the
behavior is undefined. [...]
The actual call also has undefined behavior if the number of arguments passed doesn't match the number of parameters the function has.
As for omitting return: This is actually valid as long as the caller doesn't look at the returned value.
Reference (6.9.1 Function definitions, Semantics):
If the } that terminates a function is reached, and the value of the function call is used by
the caller, the behavior is undefined.
1 The sole exception seems to be the #error directive, about which the standard says:
The implementation shall not successfully translate a preprocessing translation unit
containing a #error preprocessing directive unless it is part of a group skipped by
conditional inclusion.
According to C How to Program (Deitel):
Standard library functions like printf and scanf are not part of the C programming language. For example, the compiler cannot find a spelling error in printf or scanf. When the compiler compiles a printf statement, it merely provides space in the object program for a “call” to the library function. But the compiler does not know where the library functions are—the linker does. When the linker runs, it locates the library functions and inserts the proper calls to these library functions in the object program. Now the object program is complete and ready to be executed. For this reason, the linked program is called an executable. If the function name is misspelled, it is the linker which will spot the error, because it will not be able to match the name in the C program with the name of any known function in the libraries.
These statements leave me doubtful because of the existence of header file. These files are included during the preprocessing phase, before the compiling one, and, as I read, there are used by the compiler.
So if I write print instead of printf how can't the compiler see that there is no function declared with that name and throw an error?
If it is as the book says, why can I declare function in header files if the compiler doesn't watch them?
So if I write print instead of printf how can't the compiler see that there is no function declared with that name and throw an error?
You are right. If you made a typo in any function name, any modern compiler should complain about it. For example, gcc complains for the following code:
$ cat test.c
int main(void)
{
unknown();
return 0;
}
$ gcc -c -Wall -Wextra -std=c11 -pedantic-errors test.c
test.c: In function ‘main’:
test.c:3:5: error: implicit declaration of function ‘unknown’ [-Wimplicit-function-declaration]
unknown();
^
However, in pre C99 era of C language, any function whose declaration isn't seen by the compiler, it'll assume the function returns an int. So, if you are compiling in pre-C99 mode then a compiler isn't required to warn about it.
Fortunately, this implicit int rule was removed from the C language since C99 and a compiler is required to issue a diagnostic for it in modern C (>= C99).
But if you provide only a declaration or prototype for the function:
$ cat test.c
int unknown(void); /* function prototype */
int main(void)
{
unknown();
return 0;
}
$ gcc -c -Wall -Wextra -std=c89 -std=c11 test.c
$
(Note: I have used -c flag to just compile without linking; but if you don't use -c then compiling & linking will be done in a single step and the error would still come from the linker).
There's no issue despite the fact, you do not have definition for unknown() anywhere. This is because the compiler assumes unknown() has been defined elsewhere and only when the linker looks to resolve the symbol unknown, it'll complain if it can't find the definition for unknown().
Typically, the header file(s) only provide the necessary declarations or prototypes (I have provided a prototype for unknown directly in the file itself in the above example -- it might as well be done via a header file) and usually not the actual definition. Hence, the author is correct in that sense that the linker is the one that spots the error.
So if I write print instead of printf how can't the compiler see that there is no function declared with that name and throw an error?
The compiler can see that there is no declaration in scope for the identifier designating the function. Most will emit a warning under those circumstances, and some will emit an error, or can be configured to do so.
But that's not the same thing as the compiler detecting that the function doesn't exist. It's the compiler detecting that the function name has not been declared. The compiler will exhibit the same behavior if you spell the function name correctly but do not include a prior declaration for it.
Furthermore, C90 and pre-standardization C permitted calls to functions without any prior declaration. Such calls do not conform to C99 or later, but most compilers still do accept them (usually with a warning) for compatibility purposes.
If it is as the book says, why can I declare function in header files if the compiler doesn't watch them?
The compiler does see them, and does use the declarations. Moreover, it relies on the prototype, if the declaration provides one, to perform appropriate argument and return value conversions when you call the function. Moreover, if you use functions whose argument types are altered by the default argument promotions, then your calls to such functions are non-conforming if no prototype is in scope at the point of the call. Undefined behavior results.
I write "hello world" program in C.
void main()
{ printf("Hello World"); }
// note that I haven't included any header file
The program compiles with warning as
vikram#vikram-Studio-XPS-1645:~$ gcc hello.c
hello.c: In function ‘main’:
hello.c:2:2: warning: incompatible implicit declaration of built-in function ‘printf’
vikram#vikram-Studio-XPS-1645:~$ ./a.out
Hello Worldvikram#vikram-Studio-XPS-1645:~$
How is this possible? How does the OS link a library without including any header?
The compiler builds your source file with a reference to a function called printf(), without knowing what arguments it actually takes or what its return type is. The generated assembly contains a push of the address of the string "Hello World" in the static data area of your program, followed by a call to printf.
When linking your object file into an executable, the linker sees a reference to printf and supplies the C standard library function printf(). By coincidence, the argument you have passed (const char*) is compatible with the declaration of the real printf(), so it functions correctly. However, note that the printf() that your program implicitly declares has return type int (I think), which the standard printf() also has; but if they differed, and you were to assign the result of calling printf() to a variable, you would be in the land of undefined behaviour and you would likely get an incorrect value.
Long story short: #include the correct headers to get the correct declarations for functions you use, because this kind of implicit declaration is deprecated, because it is error-prone.
The printf function is in the C library (libc in your case) which is linked implicitly (actually gcc has a printf builtin but it's outside the point).
Including the header doesn't bring in any functions for the linker, it simply informs the compiler about their declarations (i.e. "what they look like").
Obviously you should always include headers otherwise you force the compiler into making assumptions about what the functions look like.
In C, if you use a standard library function, you have to include the standard header where the function is declared. For printf you have to include stdio.h header file.
In C89 (and GNU C89 which is the language by default on gcc), a function declaration can be sometimes omitted because there is a feature called implicit function declaration: when a function identifier foo is used and the function has not been declared, the implementation would use this declaration:
/* foo is a function with an unspecified number of arguments */
extern int foo();
But this declaration is OK only for functions that return int with an unspecified but fixed number of arguments. If the function accepts a variable number of arguments (like printf) such program would invoke an undefined behavior.
Here is what C89/C90 says:
(C90, 6.7.1) "If a function that accepts a variable number of arguments is defined without a parameter type list that ends with the ellipsis notation, the behavior is undefined.
So gcc is kind enough to compile even in C89 and GNU C89: a compiler could refuse to compile.
Also note that
void main() { ... }
is not a valid definition for main (at least on hosted implementations which is probably your case).
If your main function doesn't take any argument use this valid definition:
int main(void) { ... }
The header usually1 contains only function declarations, symbolic constants, and macro definitions; it doesn't usually include function definitions.
All stdio.h gives you is the prototype declaration for printf:
int printf(const char * restrict format, ...); // as of C99
The implementation of printf is in a separate library file that your code links against.
Your code "works" for two reasons:
Under C89 and earlier versions, if the compiler sees a function call
before a declaration or definition of that function, it will assume
that the function returns int and takes an unspecified number of
parameters;
The implementation of printf returns an int, and you passed in
an argument that just happens to be compatible with what the
implementation of printf expects for the first argument.
And to echo what everyone else says, use int main(void) or int main(int argc, char **argv); unless your compiler documentation explicitly lists void main() as a legal signature, using it will invoke undefined behavior (which means everything from your code running with no apparent issues to crashing on exit to failing to load completely).
I say "usually"; I've run across some headers that contained code, but those were usually written by people who didn't know what they were doing. There may be very rare occasions where putting code in a header is justified, but as a rule it's bad practice.
hello.c:2:2: warning: incompatible implicit declaration of built-in function ‘printf’
To deal with this warning, you should include the header file (stdio.h). You're accidently using an old feature of C that has been deprecated since 1999.
Also, the fact that the link doesn't fail simply means that the standard C library is linked in by default. Whether or not you have included the relevant header is immaterial.
As the question states, what exactly are the implications of having the 'implicit declaration of function' warning? We just cranked up the warning flags on gcc and found quite a few instances of these warnings and I'm curious what type of problems this may have caused prior to fixing them?
Also, why is this a warning and not an error. How is gcc even able to successfully link this executable? As you can see in the example below, the executable functions as expected.
Take the following two files for example:
file1.c
#include <stdio.h>
int main(void)
{
funcA();
return 0;
}
file2.c
#include <stdio.h>
void funcA(void)
{
puts("hello world");
}
Compile & Output
$ gcc -Wall -Wextra -c file1.c file2.c
file1.c: In function 'main':
file1.c:3: warning: implicit declaration of function 'funcA'
$ gcc -Wall -Wextra file1.o file2.o -o test.exe
$ ./test.exe
hello world
If the function has a definition that matches the implicit declaration (ie. it returns int and has a fixed number of arguments, and does not have a prototype), and you always call it with the correct number and types of arguments, then there are no negative implications (other than bad, obsolete style).
ie, in your code above, it is as if the function was declared as:
int funcA();
Since this doesn't match the function definition, the call to funcA() from file1.c invokes undefined behaviour, which means that it can crash. On your architecture, with your current compiler, it obviously doesn't - but architectures and compilers change.
GCC is able to link it because the symbol representing the function entry point doesn't change when the function type changes (again... on your current architecture, with your current compiler - although this is quite common).
Properly declaring your functions is a good thing - if for no other reason than that it allows you to give your function a prototype, which means that the compiler must diagnose it if you are calling it with the wrong number or types of arguments.
It has the same behaviour as using a non-prototype function declaration at block scope with an int return type, because the return type can't be specified it defaults to int, like all declarations in C that do not specify a type, everything is an int.
The reason that functions can be implicitly declared is because they can only be defined at file scope, but it is unclear whether an undefined variable is block scope or file scope, therefore it is disallowed as opposed to selecting one and providing an implicit tentative definition at file or block scope. Indeed, the actual implict declaration is a block scope one, so you'll get a warning for the first reference to the function in each function it is referenced in.