Context
I'm writing a compiler for a language that requires lots of runtime functions. I'm using LLVM as my backend, so the codegen needs types for all those runtime types (functions, structs, etc) and instead of defining all of them manually using the LLVM APIs or handwriting the LLVM IR I'd like to write the headers in C and compile to the bitcode that the compiler can pull in with LLVMParseBitcodeInContext2.
Issue
The issue I'm having is that clang doesn't seem to keep any of the type declarations that aren't used by any any function definitions. Clang has -femit-all-decls which sounds like it's supposed to solve it, but it unfortunately isn't and Googling suggests it's misnamed as it only affects unused definitions, not declarations.
I then thought perhaps if I compile the headers only into .gch files I could pull them in with LLVMParseBitcodeInContext2 the same way (since the docs say they use "the same" bitcode format", however doing so errors with error: Invalid bitcode signature so something must be different. Perhaps the difference is small enough to workaround?
Any suggestions or relatively easy workarounds that can be automated for a complex runtime? I'd also be interested if someone has a totally alternative suggestion on approaching this general use case, keeping in mind I don't want to statically link in the runtime function bodies for every single object file I generate, just the types. I imagine this is something other compilers have needed as well so I wouldn't be surprised if I'm approaching this wrong.
e.g. given this input:
runtime.h
struct Foo {
int a;
int b;
};
struct Foo * something_with_foo(struct Foo *foo);
I need a bitcode file with this equivalent IR
runtime.ll
; ...etc...
%struct.Foo = type { i32, i32 }
declare %struct.Foo* #something_with_foo(%struct.Foo*)
; ...etc...
I could write it all by hand, but this would be duplicative as I also need to create C headers for other interop and it'd be ideal not to have to keep them in sync manually. The runtime is rather large. I guess I could also do things the other way around: write the declarations in LLVM IR and generate the C headers.
Someone else asked about this years back, but the proposed solutions are rather hacky and fairly impractical for a runtime of this size and type complexity: Clang - Compiling a C header to LLVM IR/bitcode
Clang's precompiled headers implementation does not seem to output LLVM IR, but only the AST (Abstract Syntax Tree) so that the header does not need to be parsed again:
The AST file itself contains a serialized representation of Clang’s
abstract syntax trees and supporting data structures, stored using the
same compressed bitstream as LLVM’s bitcode file format.
The underlying binary format may be the same, but it sounds like the content is different and LLVM's bitcode format is merely a container in this case. This is not very clear from the help page on the website, so I am just speculating. A LLVM/Clang expert could help clarify this point.
Unfortunately, there does not seem to be an elegant way around this. What I suggest in order to minimize the effort required to achieve what you want is to build a minimal C/C++ source file that in some way uses all the declarations that you want to be compiled to LLVM IR. For example, you just need to declare a pointer to a struct to ensure it does not get optimized away, and you may just provide an empty definition for a function to keep its signature.
Once you have a minimal source file, compile it with clang -O0 -c -emit-llvm -o precompiled.ll to get a module with all definitions in LLVM IR format.
An example from the snippet you posted:
struct Foo {
int a;
int b;
};
// Fake function definition.
struct Foo * something_with_foo(struct Foo *foo)
{
return NULL;
}
// A global variable.
struct Foo* x;
Output that shows that definitions are kept: https://godbolt.org/g/2F89BH
So, clang doesn't actually filter out the unused declarations. It defers emitting forward declarations till their first use. Whenever a function is used it checks if it has been emitted already, if not it emits the function declaration.
You can look at these lines in the clang repo.
// Forward declarations are emitted lazily on first use.
if (!FD->doesThisDeclarationHaveABody()) {
if (!FD->doesDeclarationForceExternallyVisibleDefinition())
return;
The simple fix here would be to either comment the last two lines or just add && false to the second condition.
// Forward declarations are emitted lazily on first use.
if (!FD->doesThisDeclarationHaveABody()) {
if (!FD->doesDeclarationForceExternallyVisibleDefinition() && false)
return;
This will cause clang to emit a declaration as soon as it sees it, this might also change the order in which definitions appear in your .ll (or .bc) files. Assuming that is not an issue.
To make it cleaner you can also add a command line flag --emit-all-declarations and check that here before you continue.
Related
I need a tool to automatically flag unused structure members in a C codebase. My definition of "unused" is simple - if the structure member definition is removed from the code, and the code compiles successfully, then the structure member is declared unused. The question is - how can this be done in an automated way? (speed isn't too much of concern as the codebase is small).
The existing stack overflow articles on this topic seem to hint that there is no existing static analysis tool that can do this today. On the other hand, given the modularity of Clang, I feel that this should be doable with AST manipulation. Let's take a single file, for example. What I would like to do is the following (this can be later generalized to a set of source files in the codebase):
Generate AST from C code.
Recursively visit all structure field defintions and remove them one-by-one. We can keep a "seen" dictionary to ensure we don't remove already seen field definition nodes.
Filter out field definitions to only those that are present in the codebase to analyzed (to avoid definitions in standard libraries, for example).
Compile the code.
If the code compiles successfully, then the corresponding field declaration is unused and is flagged.
Proceed to #1.
The keyword above is remove. How can I remove a field definition? There seems to be two ways using Clang.
At a source level, we can remove the field declaration using Clang Rewriter (there is a "RemoveText(SourceRange)" option). But, I don't know if this will work all the time (ex: for structures are autogenerated using MACRO expansion).
Delete the field declaration node from AST, and then "re-compile" the AST (whatever that means).
Among the above two options, #1 seems hacky - you'll need to create a copy of the source file, re-write it after a field definition is removed, and then re-compile the modified source. And, I am not sure how well it will work when there are complex MACROS involved for generating structure field definitions.
#2 seems clean, but from Googling, there seems to be no such thing as "deleting a AST node" (it is immutable). Please correct me if I am wrong. Even if I succeed in this, how I proceed from this point to re-evaluate the AST for missing references to structure fields? ("the compilation" step).
Any sugesstions appreciated (thanks in advance!). I've already have some initial success with #1 approach above, but I feel that this isn't the right direction.
cppcheck can do this. For example:
// test.cpp
struct Struct
{
int used;
int unused;
};
int main()
{
Struct s;
s.used = 0;
return s.used;
}
$ cppcheck test.cpp --enable=all
Checking test.cpp ...
test.cpp:5:9: style: struct member 'Struct::unused' is never used. [unusedStructMember]
int unused;
^
While I used C++ code in the example it behaves the same for C.
I'm a beginner into Linking, sorry if my questions are too basic. lets say I have two .c files
file1.c is
int main(int argc, char *argv[])
{
int a = function2();
return 0;
}
file2.c is
int function2()
{
return 2018;
}
I know the norm is, create a file2.h and include it in file1.c, and I have some questions:
Q1. #include in file1.c doesn't make too much difference or improve much to me, I can still compile file1.c without file2.h correctly, the compiler will just warn me 'implicit declaration of function 'function2', but does this warning help a lot? Programmers might know that function2 is defined in other .c file(if you use function2 but don't define it, you certainly know the definition is somewhere else) and linker will do its job to produce the final executable file? so the only purpose of include file2,c to me is, don't show any warning during compilation, is my understanding correct.
Q2. Image this scenario, a programmer define function2 in file1.c, he doesn't know that his function2 in conflict with the one in file2.c until the linker throws the error(obvious he can compile his file1.c alone correctly. But if we want him to know his mistake when he compiles his file1.c, adding file2.h still don't help, so what's the purpose of adding header file?
Q3. What should we add to let the programmer know he should choose a different name for function2 rather then be informed the error by linker in the final stage.
Per C89 3.3.2.2 Function calls emphasis mine:
If the expression that precedes the parenthesized argument list in a function call consists solely of an identifier, and if no declaration is visible for this identifier, the identifier is implicitly declared exactly as if, in the innermost block containing the function call, the declaration
extern int identifier();
appeared
Now, remember, empty parameter list (declared with nothing inside the () braces) declares a function that takes unspecified type and number of arguments. Type void inside braces to declare that a function takes no arguments, like int func(void).
Q1:
does this warning help a lot?
Yes and no. This is a subjective question. It helps those, who use it. As a personal note, always make this warning an error. Using gcc compiler use -Werror=implicit-function-declaration. But you can also ignore this warning and make the simplest main() { printf("hello world!\n"); } program.
linker will do its job to produce the final executable file? so the only purpose of include file2,c to me is, don't show any warning during compilation, is my understanding correct.
No. In cases the function is called using different/not-compatible pointer type. It invokes undefined behavior. If the function is declared as void (*function2(void))(int a); then calling ((int(*)())function2)() is UB as is calling function2() without previous declaration. Per Annex J.2 (informative):
The behavior is undefined in the following circumstances:
A pointer is used to call a function whose type is not compatible with the pointed-to type (6.3.2.3).
and per C11 6.3.2.3p8:
A pointer to a function of one type may be converted to a pointer to a function of another type and back again; the result shall compare equal to the original pointer. If a converted pointer is used to call a function whose type is not compatible with the referenced type, the behavior is undefined.
So in your lucky case int function2() indeed this works. It also works for example for atoi() function. But calling atol() will invoke undefined behavior.
Q2:
the linker throws the error
This should happen, but is really linker dependent. If you compile all sources using a single stage with the gcc compiler it will throw an error. But if you create static libraries and then link them using gcc compiler without -Wl,-whole-archive then it will pick the first declaration is sees, see this thread.
what's the purpose of adding header file?
I guess simplicity and order. It is a convenient and standard way to share data structures (enum, struct, typedefs) and declarations (function and variable types) between developers and libraries. Also to share preprocessor directives. Image you are writing a big library with over 1000+ files that will work with over 100+ other libraries. In the beginning of each file would you write struct mydata_s { int member1; int member2; ... }; int printf(const char*, ...); int scanf(const char *, ...); etc. or just #include "mydata.h" and #include <stdio.h>? If you would need to change mydata_s structure, you would need to change all files in your project and all the other developers which use your library would need to change the definition too. I don't say you can't do it, but it would be more work to do it and no one will use your library.
Q3:
What should we add to let the programmer know he should choose a different name for function2 rather then be informed the error by linker in the final stage.
In case of name clashes you will by informed (hopefully) by the linker that it found two identifiers with the same name. You would need to create a tool to check your sources exactly for that. I don't know why the need for this, the linker is specifically made to resolve symbols so it naturally handles the cases when two symbols with the same identifier exists.
Short answer:
Take away: the earlier the compiler alert the better.
Q1: meaning of .h: consistency and early alerts. Alerting early on common ways of going wrong improves reliability of code and adds up to less debugging and production crashes.
Q2: Clashing Names bring early alerts to developers, which are usually easier to fix.
Q3: Early duplicate definition alerts are not baked into the C standard.
Exercises:
1. Define a function in one file that printf("%d\n",i) an int argument then call that function in another file with a float of 42.0.
2. Call with (double)42.0.
3. Define function with char *str argument printed under %.s then call with int argument.
Longer answers:
Popular convention: in typical use the name of the .h file is derived from the .c file, or files, it is associated with. file.h and file.c. For .h files with many definitions, say string.h, derive the file name from a hither perspective of what's within (as in the str... functions).
My big rule: it’s always better to structure your code so compilers can immediately alert on bugs at compile time rather than letting them slide through to debug or run time where they depend on code actually running in just the right way to find. Run time errors can be very difficult to diagnose, especially if they hit long after the program is in production, and expensive in maintenance and brings down your customer experience. See "yoda notation".
Q1: meaning of .h: consistency and early alerts and improved reliability of code.
C .h files allow developers of .c files compiled at different times to share common declarations. No duplicate code. .h files also allow functions to be consistently called from all files while identifying improper argument signatures (argument counts, bad clashes, etc.). Having.c files defining functions also #include the .h file helps assure the arguments in the definition are consistent with the calls; this may sound elementary, but without it all the human errors of signature clashes can sneak through.
Omitting .h files only works if the argument signatures of all callers perfectly match those in the definitions. This is often not the case so without .h files any clashing signatures would produce bad numbers unless you also had parallel externs in the calling file (bad bad bad). Things like int vs float can produce spectacularly wrong argument values. Bad pointers can produce segment faults and other total crashes.
Advantage: with externs in .h files compilers can correctly cast mismatching arguments to the correct type, assuring better calls. While you can still botch arguments it’s much less likely. It also helps avoid conditions where the mismatches work on one implementation but not another.
Implicit declaration warnings are hugely helpful to me as they usually indicate I’ve forgotten a .h file or spelled the name an external name wrong.
Q2: Clashing Names. Early alerts.
Clashing names are bad and it is the developers responsibility to avoid problems. C++ solves the issue with name spaces, which C, being a lower level language, does not have.
Use of .h files can allow can let compiler diagnostics alert developers where clashes care are early in the game. If compiler diagnostics don’t do this hopefully linkers will do so on multidefined symbol errors, but this is not guaranteed by the standard.
A common way to fake name spaces is by starting all potentially clashing definitions in a .h with some prefix (extern int filex_function1(int arg, char *string) or #define FILEX_DEF 42).
What to do if two different external libraries being used share the same names is beyond the scope of this answer.
Q3: early duplicate alerts. Sorry… early alerts are implementation dependent.
This would be difficult for the C standard to define. As C is an old language there are many creative different ways C programs are written and stored.
Hunting for clashing names before using them is up to the developer. Tools like cross reference programs can help. Even something stupid like ctags associated with vim or emacs can help.
you misunderstand usage of header files and function prototypes.
header files are needed to share common information between multiple code files. such information includes macro definition, data types, and, possibly, function prototypes.
function protoypes are needed for the compiler to correctly handle return data types and to give you early warnings of misuse of function return types and arguments.
function prototypes can be declared in header files or can be declared in the files which use them (more typing).
you have a very simple example, with just 2 files. Now imagine a project with hudreds of files and thousands of functions. You will be lost in linker errors.
'c' allows you to use an undeclared function due to legacy reasons. In this situation it assumes that the function has a return type of 'int'. However, modern data types has a bigger veriety than in early days. The function can return pointers, 64-bit data, structures. To express that you must use prototypes or nothing will work. The compiler has to know how to handle function returns correctly.
Also, it can give you warnings about incorrect use of argument types. Due to leagacy, those are still warnings, but they got addressed in early c++ and converted to errors.
Those warnings give you early debugging capabilities. Type mismatch warnings can save you days of debugging in some cases.
So, in your example you do not need the header file. You can prototype the function in the 'main' file using the 'extern' syntax. You can even do without prototyping. However, in real modern programming world you cannot allow the latter. In particular when you work in a team or want your program to be maintainable.
It is a good idea to store you funcion protypes in header files. This would be a good documentation source, in particular with good comments. BTW, function names must make sense to be maintainable.
Q1. Yes. C is a low level language, and was historically used to bind low level constructs into higher level concepts. For example, traditionally the label _end is at the last address in a program. The label is typeless but you can declare it as any type that is convenient to you. A "properly typed" language would make this sort of abuse difficult.
Q2. By convention, both file1.c and file2.c would include file2.h; one as consumer, the other as producer. Following this simple idiom will catch declaration vs definition errors; although again, the "warning" is not necessarily enforced.
Q3. Many software organizations take a "warnings are errors" rule to socially control their programmers.
I am thinking about the following problem: I want to program a microcontroller (let's say an AVR mega type) with a program that uses some sort of look-up tables.
The first attempt would be to locate the table in a separate file and create it using any other scripting language/program/.... In this case there is quite some effort in creating the necessary source files for C.
My thought was now to use the preprocessor and compiler to handle things. I tried to implement this with a table of sine values (just as an example):
#include <avr/io.h>
#include <math.h>
#define S1(i,n) ((uint8_t) sin(M_PI*(i)/n*255))
#define S4(i,n) S1(i,n), S1(i+1,n), S1(i+2,n), S1(i+3,n)
uint8_t lut[] = {S4(0,4)};
void main()
{
uint8_t val, i;
for(i=0; i<4; i++)
{
val = lut[i];
}
}
If I compile this code I get warnings about the sin function. Further in the assembly there is nothing in the section .data. If I just remove the sin in the third line I get the data in the assembly. Clearly all information are available at compile time.
Can you tell me if there is a way to achieve what I intent: The compiler calculates as many values as offline possible? Or is the best way to go using an external script/program/... to calculate the table entries and add these to a separate file that will just be #included?
The general problem here is that sin call makes this initialization de facto illegal, according to rules of C language, as it's not constant expression per se and you're initializing array of static storage duration, which requires that. This also explains why your array is not in .data section.
C11 (N1570) §6.6/2,3 Constant expressions (emphasis mine)
A constant expression can be evaluated during translation rather than
runtime, and accordingly may be used in any place that a constant may
be.
Constant expressions shall not contain assignment, increment,
decrement, function-call, or comma operators, except when they are
contained within a subexpression that is not evaluated.115)
However as by #ShafikYaghmour's comment GCC will replace sin function call with its built-in counterpart (unless -fno-builtin option is present), that is likely to be treated as constant expression. According to 6.57 Other Built-in Functions Provided by GCC:
GCC includes built-in versions of many of the functions in the
standard C library. The versions prefixed with __builtin_ are always
treated as having the same meaning as the C library function even if
you specify the -fno-builtin option.
What you are trying is not part of the C language. In situations like this, I have written code following this pattern:
#if GENERATE_SOURCECODE
int main (void)
{
... Code that uses printf to write C code to stdout
}
#else
// Source code generated by the code above
... Here I paste in what the code above generated
// The rest of the program
#endif
Every time you need to change it, you run the code with GENERATE_SOURCECODE defined, and paste in the output. Works well if your code is self contained and the generated output only ever changes if the code generating it changes.
First of all, it should go without saying that you should evaluate (probably by experiment) whether this is worth doing. Your lookup table is going to increase your data size and programmer effort, but may or may not provide a runtime speed increase that you need.
If you still want to do it, I don't think the C preprocessor can do it straightforwardly, because it has no facilities for iteration or recursion.
The most robust way to go about this would be to write a program in C or some other language to print out C source for the table, and then include that file in your program using the preprocessor. If you are using a tool like make, you can create a rule to generate the table file and have your .c file depend on that file.
On the other hand, if you are sure you are never going to change this table, you could write a program to generate it once and just paste it in.
(I found this question which is similar but not a duplicate:
How to check validity of header file in C programming language )
I have a function implementation, and a non-matching prototype (same name, different types) which is in a header file. The header file is included by a C file that uses the function, but is not included in the file that defines the function.
Here is a minimal test case :
header.h:
void foo(int bar);
File1.c:
#include "header.h"
int main (int argc, char * argv[])
{
int x = 1;
foo(x);
return 0;
}
File 2.c:
#include <stdio.h>
typedef struct {
int x;
int y;
} t_struct;
void foo (t_struct *p_bar)
{
printf("%x %x\n", p_bar->x, p_bar->y);
}
I can compile this with VS 2010 with no errors or warnings, but unsurprisingly it segfaults when I run it.
The compiler is fine with it (this I understand)
The linker did not catch it (this I was slightly surprised by)
The static analysis tool (Coverity) did not catch it (this I was very surprised by).
How can I catch these kinds of errors?
[Edit: I realise if I #include "header.h" in file2.c as well, the compiler will complain. But I have an enormous code base and it is not always possible or appropriate to guarantee that all headers where a function is prototyped are included in the implementation files.]
Have the same header file included in both file1.c and file2.c. This will pretty much prevent a conflicting prototype.
Otherwise, such a mistake cannot be detected by the compiler because the source code of the function is not visible to the compiler when it compiles file1.c. Rather, it can only trust the signature that has been given.
At least theoretically, the linker could be able to detect such a mismatch if additional metadata is stored in the object files, but I am not aware if this is practically possible.
-Werror-implicit-function-declaration, -Wmissing-prototypes or equivalent on one of your supported compilers. then it will either error or complain if the declaration does not precede the definition of a global.
Compiling the programs in some form of strict C99 mode should also generate these messages. GCC, ICC, and Clang all support this feature (not sure about MS's C compiler and its current status, as VS 2005 or 2008 was the latest I've used for C).
You may use the Frama-C static analysis platform available at http://frama-c.com.
On your examples you would get:
$ frama-c 1.c 2.c
[kernel] preprocessing with "gcc -C -E -I. 1.c"
[kernel] preprocessing with "gcc -C -E -I. 2.c"
[kernel] user error: Incompatible declaration for foo:
different type constructors: int vs. t_struct *
First declaration was at header.h:1
Current declaration is at 2.c:8
[kernel] Frama-C aborted: invalid user input.
Hope this helps!
Looks like this is not possible with C compiler because of its way how function names are mapped into symbolic object names (directly, without considering actual signature).
But this is possible with C++ because it uses name mangling that depends on function signature. So in C++ void foo(int) and void foo(t_struct*) will have different names on linkage stage and linker will raise error about it.
Of course, that will not be easy to switch a huge C codebase to C++ in turn. But you can use some relatively simple workaround - e.g. add single .cpp file into your project and include all C files into it (actually generate it with some script).
Taking your example and VS2010 I added TestCpp.cpp to project:
#include "stdafx.h"
namespace xxx
{
#include "File1.c"
#include "File2.c"
}
Result is linker error LNK2019:
TestCpp.obj : error LNK2019: unresolved external symbol "void __cdecl xxx::foo(int)" (?foo#xxx##YAXH#Z) referenced in function "int __cdecl xxx::main(int,char * * const)" (?main#xxx##YAHHQAPAD#Z)
W:\TestProjects\GenericTest\Debug\GenericTest.exe : fatal error LNK1120: 1 unresolved externals
Of course, this will not be so easy for huge codebase, there can be other problems leading to compilation errors that cannot be fixed without changing codebase. You can partially mitigate it by protecting .cpp file contents with conditional #ifdef and use only for periodical checks rather than for regular builds.
Every (non-static) function defined in every foo.c file should have a prototype in the corresponding foo.h file, and foo.c should have #include "foo.h". (main is the only exception.) foo.h should not contain prototypes for any functions not defined in foo.c.
Every function should prototyped exactly once.
You can have .h files with no corresponding .c files if they don't contain any prototypes. The only .c file without a corresponding .h file should be the one containing main.
You already know this, and your problem is that you have a huge code base where this rule has not been followed.
So how do you get from here to there? Here's how I'd probably do it.
Step 1 (requires a single pass over your code base):
For each file foo.c, create a file foo.h if it doesn't already exist. Add "#include "foo.h" near the top of foo.c. If you have a convention for where .h and .c files should live (either in the same directory or in parallel include and src directories, follow it; if not, try to introduce such a convention).
For each function in foo.c, copy its prototype to foo.h if it's not already there. Use copy-and-paste to ensure that everything stays consistent. (Parameter names are optional in prototypes and mandatory in definitions; I suggest keeping the names in both places.)
Do a full build and fix any problems that show up.
This won't catch all your problems. You could still have multiple prototypes for some functions. But you'll have caught any cases where two headers have inconsistent prototypes for the same function and both headers are included in the same translation unit.
Once everything builds cleanly, you should have a system that's at least as correct as what you started with.
Step 2:
For each file foo.h, delete any prototypes for functions that aren't defined in foo.c.
Do a full build and fix any problems that show up. If bar.c calls a function that's defined in foo.c, then bar.c needs a #include "foo.h".
For both of these steps, the "fix any problems that show up" phase is likely to be long and tedious.
If you can't afford to do all this at once, you can probably do a lot of it incrementally. Start with one or a few .c files, clean up their .h files, and remove any extra prototypes declared elsewhere.
Any time you find a case where a call uses an incorrect prototype, try to figure out the circumstances in which that call is executed, and how it causes your application to misbehave. Create a bug report and add a test to your regression test suite (you have one, right?). You can demonstrate to management that the test now passes because of all the work you've done; you really weren't just messing around.
Automated tools that can parse C are likely to be useful. Ira Baxter has some suggestions. ctags may also be useful. Depending on how your code is formatted, you can probably throw together some tools that don't require a full C parser. For example, you might use grep, sed, or perl to extract a list of function definitions from a foo.c file, then manually edit the list to remove false positives.
Its obvious ("I have a huge code base") you cannot do this by hand.
What you need is an automated tool that can read your source files as the compiler sees them, collect all function prototypes and definitions, and verify that all definitions/prototypes match. I doubt you'll find such a tool lying around.
Of course, this match much check the signature, and this requires something like the compiler's front end to compare the signatures.
Consider
typedef int T;
void foo(T x);
in one compilation unit, and
typedef float T;
void foo(T x);
in another. You can't just compare the signature "lines" for equality; you need something that can resolve the types when checking.
GCCXML may be able to help, if you are using a GCC dialect of C; it extracts top-level declarations from source files as XML chunks. I don't know if it will resolve typedefs, though. You obviously have to build (considerable) support to collect the definitions in a central place (a database) and compare them. Comparing XML documents for equivalents is at least reasonably straightforward, and pretty easy if they are formatted in a regular way. This is likely your easiest bet.
If that doesn't work, you need something that has a full C front end that you can customize. GCC is famously available, and famously hard to customize. Clang is available, and might be pressed into service for this, but AFAIK only works with GCC dialects.
Our DMS Software Reengineering Toolkit has C front ends (with full preprocessing capability) for many dialects of C (GCC, MS, GreenHills, ...) and builds symbol tables with complete type information. Using DMS you might be able (depending on the real scale of your application) to simply process all the compilation units, and build just the symbol tables for each compilation unit. Checking that symbol table entries "match" (are compatible according to compiler rules including using equivalent typedefs) is built-into the C front ends; all one needs to do is orchestrate the reading, and calling the match logic for all symbol table entries at global scope across the various compilation units.
Whether you do this with GCC/Clang/DMS, it is a fair amount of work to cobble together a custom tool. So you have decide how critical you need for fewer suprises is, compared to the energy to build such a custom tool.
How does the compiler know the prototype of sleep function or even printf function, when I did not include any header file in the first place?
Moreover, if I specify sleep(1,1,"xyz") or any arbitrary number of arguments, the compiler still compiles it.
But the strange thing is that gcc is able to find the definition of this function at link time, I don't understand how is this possible, because actual sleep() function takes a single argument only, but our program mentioned three arguments.
/********************************/
int main()
{
short int i;
for(i = 0; i<5; i++)
{
printf("%d",i);`print("code sample");`
sleep(1);
}
return 0;
}
Lacking a more specific prototype, the compiler will assume that the function returns int and takes whatever number of arguments you provide.
Depending on the CPU architecture arguments can be passed in registers (for example, a0 through a3 on MIPS) or by pushing them onto the stack as in the original x86 calling convention. In either case, passing extra arguments is harmless. The called function won't use the registers passed in nor reference the extra arguments on the stack, but nothing bad happens.
Passing in fewer arguments is more problematic. The called function will use whatever garbage happened to be in the appropriate register or stack location, and hijinks may ensue.
In classic C, you don't need a prototype to call a function. The compiler will infer that the function returns an int and takes a unknown number of parameters. This may work on some architectures, but it will fail if the function returns something other than int, like a structure, or if there are any parameter conversions.
In your example, sleep is seen and the compiler assumes a prototype like
int sleep();
Note that the argument list is empty. In C, this is NOT the same as void. This actually means "unknown". If you were writing K&R C code, you could have unknown parameters through code like
int sleep(t)
int t;
{
/* do something with t */
}
This is all dangerous, especially on some embedded chips where the way parameters are passed for a unprototyped function differs from one with a prototype.
Note: prototypes aren't needed for linking. Usually, the linker automatically links with a C runtime library like glibc on Linux. The association between your use of sleep and the code that implements it happens at link time long after the source code has been processed.
I'd suggest that you use the feature of your compiler to require prototypes to avoid problems like this. With GCC, it's the -Wstrict-prototypes command line argument. In the CodeWarrior tools, it was the "Require Prototypes" flag in the C/C++ Compiler panel.
C will guess int for unknown types. So, it probably thinks sleep has this prototype:
int sleep(int);
As for giving multiple parameters and linking...I'm not sure. That does surprise me. If that really worked, then what happened at run-time?
This is to do with something called 'K & R C' and 'ANSI C'.
In good old K & R C, if something is not declared, it is assumed to be int.
So any thing that looks like a function call, but not declared as function
will automatically take return value of 'int' and argument types depending
on the actuall call.
However people later figured out that this can be very bad sometimes. So
several compilers added warning. C++ made this error. I think gcc has some
flag ( -ansic or -pedantic? ) , which make this condition an error.
So, In a nutshell, this is historical baggage.
Other answers cover the probable mechanics (all guesses as compiler not specified).
The issue that you have is that your compiler and linker have not been set to enable every possible error and warning. For any new project there is (virtually) no excuse for not doing so. for legacy projects more excuse - but should strive to enable as many as possible
Depends on the compiler, but with gcc (for example, since that's the one you referred to), some of the standard (both C and POSIX) functions have builtin "compiler intrinsics". This means that the compiler library shipped with your compiler (libgcc in this case) contains an implementation of the function. The compiler will allow an implicit declaration (i.e., using the function without a header), and the linker will find the implementation in the compiler library because you're probably using the compiler as a linker front-end.
Try compiling your objects with the '-c' flag (compile only, no link), and then link them directly using the linker. You will find that you get the linker errors you expect.
Alternatively, gcc supports options to disable the use of intrinsics: -fno-builtin or for granular control, -fno-builtin-function. There are further options that may be useful if you're doing something like building a homebrew kernel or some other kind of on-the-metal app.
In a non-toy example another file may include the one you missed. Reviewing the output from the pre-processor is a nice way to see what you end up with compiling.