How does C exactly run? - c

I just realized when I define a function in C and use it, I can either use it and define the function later or define it and use it later. For example,
int mult (int x, int y)
{
return x * y;
}
int main()
{
int x;
int y;
scanf( "%d", &x );
scanf( "%d", &y );
printf( "The product of your two numbers is %d\n", mult( x, y ) );
}
and
int main()
{
int x;
int y;
scanf( "%d", &x );
scanf( "%d", &y );
printf( "The product of your two numbers is %d\n", mult( x, y ) );
}
int mult (int x, int y)
{
return x * y;
}
will both run just fine. However, in Python, the second code will fail since it requires mult(x,y) to be defined before you can use it and Python executes from top to bottom(as far as I know). Obviously, that can't be the case in C since the second one runs just fine. So how does C code actually flow?

Well, the second code is not valid C, strictly speaking.
It uses your compiler's flexibility to allow an implicit declaration of a function, which has been disallowed in C standard.
The C11 standatd explicitly mentions the exclusion in the "Foreword",
Major changes in the second edition included:
...
remove implicit function declaration
You have to either
Forward declare the function.
Define the function before it's usage (like snippet 1).
Enable the warning in your compiler and your compiler should produce some warning message to let you know about this problem.

As others have noted, routines should be declared before they are used, although they do not need to be defined before they are used. Additionally, older versions of C allowed some implicit declaration of routines, and some compilers still do, although this is largely archaic now.
As to how C is able to support the calling of functions before they are defined, C programs are first translated into some executable format, after which a program is executed.
During translation, a C compiler reads, analyzes, and processes the entire program. Any references to functions that have not yet been defined are recorded as things that need to be resolved in the program. In the process of preparing a final executable file, a linker goes through all of the processed data, finds the function definitions, and resolves the references by inserting the addresses (or other information) of the called routines.
Most commonly, a C compiler translates source code into an object module. The object module contains machine-language instructions for the program. It also contains any data for the program that is defined in the source code, and it contains information about unresolved references that the compiler found while analyzing the source code. Multiple source files may be translated separately into multiple object modules. Sometimes these translations are done by different people at different times. A company might produce a software library which is the result of translating their source files into object modules and packaging them into a library file. Then a software developer would compile their own source files and link the resulting object modules with the object modules in the library.
Multiple object modules can be linked together to make an executable file. This is a file that the operating system is able to load into memory and execute.

For second code you should use forward declaration. That means first declare the function so that compiler will know you will be using this function. For now your code is executed as per C compiler flexibility.
Python compiler is not flexible enough so it will fail to compile.

You said both the codes work just fine. Well, it doesn't. The second snippet will show an error in my compiler and if it does compile correctly, it should not be used.
It voids C regulations.
Declaring functions before is helpful when too many user defined functions are needed.

Related

C function that only returns input parameter

For reasons out of my control, I have to implement this function in my C code:
double simple_round(double u)
{
return u;
}
When this function is called, is it ignored by the compiler, or does the call take place anyway? For instance:
int y;
double u = 3.3;
y = (int)simple_round(u*5.5); //line1
y = (int)u*5.5; //line2
Will both lines of code take the same time to be executed, or will the first one take longer?
Because the function is defined in a different C file from where it's used, if you don't use link-time optimization, when the compiler calls the function call it won't know what the function does, so it will have to actually compile the function call. The function will probably just have two instructions: copy the argument to the return value, then return.
The extra function call may or may not slow down the program, depending on the type of CPU and what else the CPU is doing (the other instructions nearby)
It will also force the compiler to consider that it might be calling a very complicated function that overwrites lots of registers (whichever ones are allowed to be overwritten by a function call); this will make the register allocation worse in the function that calls it, perhaps making that function longer and making it need to do more memory accesses.
When this function is called, is it ignored by the compiler, or does the call take place anyway?
It depends. If the function definition is in the same *.c file as the places where it's called then the compiler most probably automatically inlines it, because it has some criteria to inline very simple functions or functions that are called only once. Of course you have to specify a high enough optimization level
But if the function definition is in another compilation unit then the compiler can't help unless you use link-time optimization (LTO). That's because in C each *.c file is a separate compilation unit and will be compiled to a separate object (*.o) file and compilers don't know the body of functions in other compilation units. Only at the link stage the unresolved identifiers are filled with their info from the other compilation units
In this case the generated code in a *.c file calls a function that you can change in another *.c file then there are many more reliable solutions
The most correct method is to fix the generator. Provide evidences to show that the function the generated code calls is terrible and fix it
In case you really have no way to fix the generator then one possible way is to remove the generated *.c file from the compilation list (i.e. don't compile it into *.o anymore) and include it in your own *.c file
#define simple_round(x) (x)
#include "generated.c"
#undef simple_round
Now simple_round() calls in generated.c will be replaced with nothing
If the 'generated' code has to be compiled anyway, perhaps you can 'kludge' a macro, Macro, that redefines the call to the 'inefficient' rounding function made by that code.
Here's a notion (all in one file). Perhaps the #define can be 'shimmed in' (and documented!) into the makefile entry for that single source file.
int fnc1( int x ) { return 5 * x; }
void main( void ) {
printf( "%d\n", fnc1( 5 ) );
#define fnc1(x) (x)
printf( "%d\n", fnc1( 7 ) );
}
Output:
25
7

consequences of multiple weak symbols(C Linker)

I just have a question about potential problem of multiple weak symbols, this question is from my textbook:
One module A:
int x;
int y;
p1() {...}
the other module B:
double x;
p2() {...}
and my textbook says that 'write to x in p2 might overwrite y'
I can kind of get the idea of the textbook( double x is twice the size of int x, and int y is placed right after int x, here comes the problem), but still lost in details, I know when there are multiple weak symbols, the linker will just randomly pick one, so my question is, which x of module that the linker choose will result in writing to x in p2 will overwrite y.
This is my understanding :if the linker choose the int x of module A will result in the consequence, because in that way x,y are both 4 bytes and the p2(image after compilation there is one assembly code movq compared by movl in p1 )will change 8 bytes therefore overwrite y.
But my instructor said if only the linker choose double x of module B, that will result in overwriting y, how come, am I correct or my instructor is correct?
According to ISO C, the program invokes undefined behavior. An external name which is used must have exactly one definition somewhere in the program.*
"Weak symbols" are a concept in some dynamic library systems like ELF on GNU/Linux. That terminology does not apply here. A linker which allows multiple definitions of an external symbol is said to be implementing the "relaxed ref/def" model. This term comes from section 6.1.2.2 of the ANSI C rationale.
If we regard the relaxed ref/def model as a documented language extension, then the multiple definitions of a name become locally defined behavior. However, what if they are inconsistently typed? That is almost certainly undefined by the reasoning that the situation resembles bad type aliasing. It is possible that if one module has int x; int y; and the other has double x, that a write through the double x alias will clobber y. This isn't something you can portably rely on. It's a very poor way to obtain an aliasing effect on purpose; you want to use a union between two structures or some such.
Now about "weak symbols": those are external names in shared libraries that can be overridden by alternative definitions. For instance, most of the functions in the GNU C library on a GNU/Linux system are weak symbols. A program can define its own read function to replace the POSIX one, for instance. The library itself will not break not matter how read is redefined; when it needs to call read, it doesn't use the weak symbol read but some internal alias like __libc_read.
This mechanism is important; it allows the library to conform to ISO C. A strictly conforming ISO C program is allowed to use read as an external name.
* In the ISO C99 standard, this was given in 6.9 External Definitions: "If an identifier declared with external linkage is used in an expression (other than as part of the operand of a sizeof operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the identifier; otherwise, there shall be no more than one."

initialising constant static array with algorhythm [duplicate]

I am thinking about the following problem: I want to program a microcontroller (let's say an AVR mega type) with a program that uses some sort of look-up tables.
The first attempt would be to locate the table in a separate file and create it using any other scripting language/program/.... In this case there is quite some effort in creating the necessary source files for C.
My thought was now to use the preprocessor and compiler to handle things. I tried to implement this with a table of sine values (just as an example):
#include <avr/io.h>
#include <math.h>
#define S1(i,n) ((uint8_t) sin(M_PI*(i)/n*255))
#define S4(i,n) S1(i,n), S1(i+1,n), S1(i+2,n), S1(i+3,n)
uint8_t lut[] = {S4(0,4)};
void main()
{
uint8_t val, i;
for(i=0; i<4; i++)
{
val = lut[i];
}
}
If I compile this code I get warnings about the sin function. Further in the assembly there is nothing in the section .data. If I just remove the sin in the third line I get the data in the assembly. Clearly all information are available at compile time.
Can you tell me if there is a way to achieve what I intent: The compiler calculates as many values as offline possible? Or is the best way to go using an external script/program/... to calculate the table entries and add these to a separate file that will just be #included?
The general problem here is that sin call makes this initialization de facto illegal, according to rules of C language, as it's not constant expression per se and you're initializing array of static storage duration, which requires that. This also explains why your array is not in .data section.
C11 (N1570) §6.6/2,3 Constant expressions (emphasis mine)
A constant expression can be evaluated during translation rather than
runtime, and accordingly may be used in any place that a constant may
be.
Constant expressions shall not contain assignment, increment,
decrement, function-call, or comma operators, except when they are
contained within a subexpression that is not evaluated.115)
However as by #ShafikYaghmour's comment GCC will replace sin function call with its built-in counterpart (unless -fno-builtin option is present), that is likely to be treated as constant expression. According to 6.57 Other Built-in Functions Provided by GCC:
GCC includes built-in versions of many of the functions in the
standard C library. The versions prefixed with __builtin_ are always
treated as having the same meaning as the C library function even if
you specify the -fno-builtin option.
What you are trying is not part of the C language. In situations like this, I have written code following this pattern:
#if GENERATE_SOURCECODE
int main (void)
{
... Code that uses printf to write C code to stdout
}
#else
// Source code generated by the code above
... Here I paste in what the code above generated
// The rest of the program
#endif
Every time you need to change it, you run the code with GENERATE_SOURCECODE defined, and paste in the output. Works well if your code is self contained and the generated output only ever changes if the code generating it changes.
First of all, it should go without saying that you should evaluate (probably by experiment) whether this is worth doing. Your lookup table is going to increase your data size and programmer effort, but may or may not provide a runtime speed increase that you need.
If you still want to do it, I don't think the C preprocessor can do it straightforwardly, because it has no facilities for iteration or recursion.
The most robust way to go about this would be to write a program in C or some other language to print out C source for the table, and then include that file in your program using the preprocessor. If you are using a tool like make, you can create a rule to generate the table file and have your .c file depend on that file.
On the other hand, if you are sure you are never going to change this table, you could write a program to generate it once and just paste it in.

Automatically deleting unused local variables from C source code

I want to delete unused local variables from C file.
Example:
int fun(int a , int b)
{
int c,sum=0;
sum=a + b;
return sum;
}
Here the unused variable is 'c'.
I will externally have a list of all unused local variables. Now using unused local variables which I have, we have to find local variables from source code & delete.
In above Example "c" is unused variable. I will be knowing it (I have code for that).
Here I have to find c & delete it .
EDIT
The point is not to find unused local variables with an external tool. The point is to remove them from code given a list of them.
Turn up your compiler warning level, and it should tell you.
Putting your source fragment in "f.c":
% gcc -c -Wall f.c
f.c: In function 'fun':
f.c:1: warning: unused variable 'c'
Tricky - you will have to parse C code for this. How close does the result have to be?
Example of what I mean:
int a, /* foo */
b, /* << the unused one */
c; /* bar */
Now, it's obvious to humans that the second comment has to go.
Slight variation:
void test(/* in */ int a, /* unused */ int b, /* out */ int* c);
Again, the second comment has to go, the one before b this time.
In general, you want to parse your input, filter it, and emit everything that's not the declaration of an unused variable. Your parser would have to preserve comments and #include statements, but if you don't #include headers it may be impossible to recognize declarations (even more so if macro's are used to hide the declaration). After all, you need headers to decide if A * B(); is a function declaration (when A is a type) or a multiplication (when A is a variable)
[edit] Furthermore:
Even if you know that a variable is unused, the proper way to remove it depends a lot on remote context. For instance, assume
int foo(int a, int b, int c) { return a + b; }
Clearly, c is unused. Can you change it to ?
int foo(int a, int b) { return a + b; }
Perhaps, but not if &foo is stored int a int(*)(int,int,int). And that may happen somewhere else. If (and only if) that happens, you should change it to
int foo(int a, int b, int /*unused*/ ) { return a + b; }
Why do you want to do this? Assuming you have a decent optimizing compiler (GCC, Visual Studio et al) the binary output will not be any different wheter you remove the 'int c' in your original example or not.
If this is just about code cleanup, any recent IDE will give you quick links to the source code for each warning, just click and delete :)
My answer is more of an elaborate comment to MSalters' very thorough answer.
I would go beyond 'tricky' and say that such a tool is both impossible and inadvisable.
If you are looking to simply remove the references to the variable, then you could write a code parser of your own, but it would need to distinguish between the function context it is in such as
int foo(double a, double b)
{
b = 10.0;
return (int) b;
}
int bar(double a, double b)
{
a = 5.00;
return (int) a;
}
Any simple parser would have trouble with both 'a' and 'b' being unused variables.
Secondly, if you consider comments as MSalter has, you'll discover that people do not comment consistently;
double a;
/*a is designed as a dummy variable*/
double b;
/*a is designed as a dummy variable*/
double a;
double b;
double a; /*a is designed as a dummy variable*/
double b;
etc.
So simply removing the unused variables will create orphaned comments, which are arguably more dangerous than not commenting at all.
Ultimately, it is an obscenely difficult task to do elegantly, and you would be mangling code regardless. By automating the process, you would be making the code worse.
Lastly, you should be considering why the variables were in the code in the first place, and if they are deprecated, why they were not deleted when all their references were.
Static code analysis tools in additional to warning level as Paul correctly stated.
As well as being able to reveal these through warnings, the compiler will normally optimise these away if any optimisations are turned on. Checking if a variable is never referenced is quite trivial in terms of implementation in the compiler.
You will need a good parser that preserves original character position of tokens (even in presence of preprocessor!). There are some tools for automated refactoring of C/C++, but they are far from mainstream.
I recommend you to check out Taras' Blog. The guy is doing some large automated refactorings of Mozilla codebase, like replacing out-params with return values. His main tool for code rewriting is Pork:
Pork is a C++ parsing and rewriting
tool chain. The core of Pork is a C++
parser that provides exact character
positions for the start and end of
every AST node, as well as the set of
macro expansions that contain any
location. This information allows C++
to be automatically rewritten in a
precise way.
From the blog:
So far pork has been used for “minor”
things like renaming
classes&functions, rotating
outparameters and correcting prbool
bugs. Additionally, Pork proved itself
in an experiment which involved
rewriting almost every function (ie
generating a 3+MB patch) in Mozilla to
use garbage collection instead of
reference-counting.
It is for C++, but it may suit your needs.
One of the posters above says "impossible and inadvisable".
Another says "tricky", which is the right answer.
You need 1) a full C (or whatever language of interest) parser,
2) inference procedures that understand the language
identifier references and data flows to determine that a variable
is indeed "dead", and 3) the ability to actually modify
the source code.
What's hard about all this is the huge energy to build
1) 2) 3). You can't justify for any individual cleanup task.
What one can do is to build such infrastructure specifically
with the goal of amortizing it across lots of differnt
program analysis and transformation tasks.
My company offers such a tool: The DMS Software Reengineering
Toolkit. See
http://www.semdesigns.com/Products/DMS/DMSToolkit.html
DMS has production quality front ends for many languages,
including C, C++, Java and COBOL.
We have in fact built an automated "find useless declarations"
tool for Java that does two things:
a) lists them all (thus producing the list!)
b) makes a copy of the code with the useless declarations
removed.
You choose which answer you want to keep :-)
To do the same for C would not be difficult. We already
have a tool that identifies such dead variables/functions.
One case we did not addess, is the "useless parameter"
case, becasue to remove a useless parameter, you have
to find all the calls from other modules,
verify that setting up the argument doesn't have a side
effect, and rip out the useless argument.
We in fact have full graphs of the entire software
system of interest, and so this would also be
possible.
So, its just tricky, and not even very tricky
if you have the right infrastructure.
You can solve the problem as a text processing problem. There must be a small number of regexp patterns how unused local variables are defined in the source code.
Using a list of unused variable names and the line numbers where they are, You can process the C source code line-by-line. On each line You can iterate over the variable names. On each variable name You can match the patterns one-by-one. After a successful match You know the syntax of the definition, so You know how to delete the unused variable from it.
For example if the source line is: "int a, unused, b;" and the compiler reported "unused" as an unused variable in that line, than the pattern "/, unused,/" will match and You can replace that substring with a single ",".
Also: splint.
Splint is a tool for statically checking C programs for security vulnerabilities and coding mistakes. With minimal effort, Splint can be used as a better lint. If additional effort is invested adding annotations to programs, Splint can perform stronger checking than can be done by any standard lint.

Why don't we get a compile time error even if we don't include stdio.h in a C program?

How does the compiler know the prototype of sleep function or even printf function, when I did not include any header file in the first place?
Moreover, if I specify sleep(1,1,"xyz") or any arbitrary number of arguments, the compiler still compiles it.
But the strange thing is that gcc is able to find the definition of this function at link time, I don't understand how is this possible, because actual sleep() function takes a single argument only, but our program mentioned three arguments.
/********************************/
int main()
{
short int i;
for(i = 0; i<5; i++)
{
printf("%d",i);`print("code sample");`
sleep(1);
}
return 0;
}
Lacking a more specific prototype, the compiler will assume that the function returns int and takes whatever number of arguments you provide.
Depending on the CPU architecture arguments can be passed in registers (for example, a0 through a3 on MIPS) or by pushing them onto the stack as in the original x86 calling convention. In either case, passing extra arguments is harmless. The called function won't use the registers passed in nor reference the extra arguments on the stack, but nothing bad happens.
Passing in fewer arguments is more problematic. The called function will use whatever garbage happened to be in the appropriate register or stack location, and hijinks may ensue.
In classic C, you don't need a prototype to call a function. The compiler will infer that the function returns an int and takes a unknown number of parameters. This may work on some architectures, but it will fail if the function returns something other than int, like a structure, or if there are any parameter conversions.
In your example, sleep is seen and the compiler assumes a prototype like
int sleep();
Note that the argument list is empty. In C, this is NOT the same as void. This actually means "unknown". If you were writing K&R C code, you could have unknown parameters through code like
int sleep(t)
int t;
{
/* do something with t */
}
This is all dangerous, especially on some embedded chips where the way parameters are passed for a unprototyped function differs from one with a prototype.
Note: prototypes aren't needed for linking. Usually, the linker automatically links with a C runtime library like glibc on Linux. The association between your use of sleep and the code that implements it happens at link time long after the source code has been processed.
I'd suggest that you use the feature of your compiler to require prototypes to avoid problems like this. With GCC, it's the -Wstrict-prototypes command line argument. In the CodeWarrior tools, it was the "Require Prototypes" flag in the C/C++ Compiler panel.
C will guess int for unknown types. So, it probably thinks sleep has this prototype:
int sleep(int);
As for giving multiple parameters and linking...I'm not sure. That does surprise me. If that really worked, then what happened at run-time?
This is to do with something called 'K & R C' and 'ANSI C'.
In good old K & R C, if something is not declared, it is assumed to be int.
So any thing that looks like a function call, but not declared as function
will automatically take return value of 'int' and argument types depending
on the actuall call.
However people later figured out that this can be very bad sometimes. So
several compilers added warning. C++ made this error. I think gcc has some
flag ( -ansic or -pedantic? ) , which make this condition an error.
So, In a nutshell, this is historical baggage.
Other answers cover the probable mechanics (all guesses as compiler not specified).
The issue that you have is that your compiler and linker have not been set to enable every possible error and warning. For any new project there is (virtually) no excuse for not doing so. for legacy projects more excuse - but should strive to enable as many as possible
Depends on the compiler, but with gcc (for example, since that's the one you referred to), some of the standard (both C and POSIX) functions have builtin "compiler intrinsics". This means that the compiler library shipped with your compiler (libgcc in this case) contains an implementation of the function. The compiler will allow an implicit declaration (i.e., using the function without a header), and the linker will find the implementation in the compiler library because you're probably using the compiler as a linker front-end.
Try compiling your objects with the '-c' flag (compile only, no link), and then link them directly using the linker. You will find that you get the linker errors you expect.
Alternatively, gcc supports options to disable the use of intrinsics: -fno-builtin or for granular control, -fno-builtin-function. There are further options that may be useful if you're doing something like building a homebrew kernel or some other kind of on-the-metal app.
In a non-toy example another file may include the one you missed. Reviewing the output from the pre-processor is a nice way to see what you end up with compiling.

Resources