Design a compiler like C - c

I'm developing a C like compiler and I want to know how the compiler works with the system include.
The compiler read the entire code, and stores all includes found in one list and parser the includes, after finish the reading the current code?
// file main.c
#include <stdio.h> // store in one list
// continue the parse ...
int main()
{
return 0;
}
// now, read the includes
// after finish the includes parse, gen code of sources
// just a sample
// file stdio.h
#include <types.h> // store in list
#include <bios.h> // store in list
void printf(...)
{
}
void scanf(...)
{
}
Btw, I have developd an system ( only test ) to read the includes and, stop the parse, to read the include... ( it's a disgusting code, but, work... )
( link of sample ) -> https://gist.github.com/4399601
Btw, What is the best way to read the includes... and work with includes files ??

#include, #define, #ifdef and the like are processed by a separate pass called the preprocessor. It replaces the lines with #include with the included files. The resulting temporary source text is then fed to later passes like the tokenizer and parser.

Any line in C that begins with # is handled by the preprocessor, not the compiler. The preprocessor generates a file that the compiler then compiles. The contents of the file depend on whatever is #defined by the developer and the SDK.

Anything which begins with # is a preprocessor directive.. the corresponding code gets substituted at the time of compilation.. the first stage of compilation is this preprocessor compilation..
then later the output of preprocessor(.i file) is given to the later stages of compilation..
later stages of compilation include LEXICAL ANALYZER, PARSER, OPTIMIZER and CODE GENERATOR..

If I was writing a compiler from scratch, I would first of all consider if handing includes is a necessary part of the language - and if so, do YOU have to write it, or could you use an already existing one (such as the cpp part of gcc). The "fun" part of a compiler, after all, is the real compiling of the code, not reading files and replacing strings with other strings through macro expansion [although that can be quite fun too, of course - but you can write that once you have a compiler that works!].
The tricky part with include files isn't the including itself (fairly trivial, recursive, function), but the parsing of #define/#ifdef/#if/#undef, and more importantly, the replacing stuff with that.
Have fun!

Related

Why is it sometimes valid to write an include statement in a codeblock?

I amcoming from a python background, where the following is valid:
def f():
import someLibrary
someLibrary.libraryFunction()
So when I needed to debug C code, I wrote, in the middle of my function:
void f(int param)
{
int status;
/* other code */
#include <stdio.h>
printf("status: %d", status);
/* more code */
}
And it compiled and worked as I expected. Later it was pointed out to me that this shouldn't compile, since the C pre-processor literally replaces the #include~ statement with the contents ofstdio.h`.
So why was this valid code?
it was pointed out to me that this shouldn't compile, since the C pre-processor literally replaces the #include statement with the contents ofstdio.h.
The logic on that doesn't make sense. Just because the pre-processor inserts the text from the stdio.h file doesn't mean it should not compile. If there's nothing in that file that would result in a compile error, then it will compile just fine.
Furthermore, headers usually have a multiple inclusion guard in them. So if they were included already previously, any further attempts to include it have no effect. In this case, if <stdio.h> was already included previously in the file (directly or indirectly), the #include will have no effect.
With that being said, don't do that though. In C, standard headers are not supposed to be included while inside a function scope.
Yeah, C and Python are pretty different in this respect.
It is correct that the preprocessor replaces the #include directive with the contents of the included file prior to compilation.
Whether it leads to a compilation error or not depends entirely on the contents of the included file. Standard headers like stdio.h don't contain any executable statements - they only contain things like typdefs, function declarations, other macros, etc. They also usually have some kind of #include guards in place that prevent them from being loaded more than once per translation unit (that is, if you #include a file that includes stdio.h, and then #include <stdio.h> directly in the same source file, the contents of stdio.h will only be loaded once).
Theoretically, there's no problem with including stdio.h at random points in the code, but it can lead to problems. In this case all of stdio.h's contents will only be visible to the body of f - not a problem if only f needs to use anything in stdio.h, but otherwise it will lead to headaches.
Standard headers are best included at the beginning of the source file.

Why do all the C files written by my lecturer start with a single # on the first line?

I'm going through some C course notes, and every C program source file begins with a single # on the first line of the program.
Then there are blank lines, and following that other stuff followed by the main function.
What is the reason for the #?
(It's out of term now and I can't really ask the chap.)
Here's an example:
#
#include <stdio.h>
int main() {
printf("Hello, World!");
return 0;
}
Wow, this requirement goes way back to the 1970s.
In the very early days of pre-standardised C, if you wanted to invoke the preprocessor, then you had to write a # as the first thing in the first line of a source file. Writing only a # at the top of the file affords flexibility in the placement of the other preprocessor directives.
From an original C draft by the great Dennis Ritchie himself:
12. Compiler control lines
[...] In order to cause [the] preprocessor to be invoked, it is necessary that the very
first line of the program begin with #. Since null lines are ignored by the preprocessor, this line need contain no other
information.
That document makes for great reading (and allowed me to jump on this question like a mad cat).
I suspect it's the lecturer simply being sentimental - it hasn't been required certainly since ANSI C.
It Does Nothing
As of the ISO standard of C/C++:
A preprocessing directive of the form
# new-line
has no effect.
So in today's compilers, that empty hash does not do anything (like- new-line ; has no functionality).
PS: In * pre-standardized C*, # new-line had an important role, it was used to invoke the C Pre-Processor (as pointed out by #Bathsheba). So, the code here was either written within that time period, or came from the habit of the programmer.
Edit: recently I have come across code like this-
#ifdef ANDROID
#
#define DEVICE_TAG "ANDROID"
#define DEBUG_ENABLED
#
#else
#
#define DEVICE_TAG "NOT_ANDROID"
#
#endif /* ANDROID */
Here, those empty hashes are there only for making the code look good. It also improves readability by indicating that it is a preprocessor block.
You need to know about the Compilation process of C. Because that is "must know" how the Source code converting into Executable binary code (file).
From the Compilation Process, the C source code has to Cross the pre-processor Section. But how to tell the Compiler to pre-process the code?... That the time # Symbol was introduced to the indicator of Preprocess to the compiler.
For Example #define PI 3.141 is in the Source code. Then it will be change after the Preprocessing session. Means, all the PI will be changed into 3.141.
This like #include <stdio.h>, the standard I/O Functions will be added into your Source code.
If you have a Linux machine, compile like gcc -save-temps source_code.c. And see the compiler outputs.

What happens when preprocessor lines are processed by the preprocessor? - the '.i' file

I am using Gnu cc compiler of Gcc to compile my C programs. Consider a program,
#include <stdio.h>
int main(){
return 0;
}
Now, when I pre-process the above code, using
cpp sample.c > sample.i
I get a lot of contents in sample.i which I haven't included. Say, 'stdio.h' file is preprocessed. If that is the case,
Question 1:
Why are there so many lines in my preprocessed file? I haven't used any of the standard library functions nor Macros.
Question 2:
Can anyone explain what exactly happens when the preprocessor proccess the C file.(The contents that I got in my '*.i' file)
Compiler: gcc
OS: Ubuntu
Thanks
Why are there so many lines in my preprocessed file? I haven't used any of the standard library functions nor Macros.
Preprocessing is just one part of the compilation process. It's more or less a simple textual replacement and nothing more complex is involved at the preprocessing stage. The preprocessor does not know or care whether you have used any standard functions in your code program or not. An optimizer (as part of the compilation process) might
"remove" parts that are not needed. But the preprocessor doesn't do that.
It'll do preprocessing of all the header files you have included and other header files included via your header files and so on.
Can anyone explain what exactly happens when the preprocessor process the C file.(The contents that I got in my '*.i' file)
The preprocessing involves quite a few tasks: macro replacement, conditional compilation, stringification, string concatenation etc.
You can read more about cpp in detail here: https://gcc.gnu.org/onlinedocs/cpp/
the preprocessor command #include "aFile.h" will put the hole content from aFile.h into your cpp file. And that exactly to the place, where the preprocessor directives stands. That is the reason why you can use the in aFile.h defined functions.
if you are interest to learn more about the preprocessor, there is a very good (and short) guidance on cplusplus.com
The preprocessor does text substitution. The net effect of #include <stdio.h> is to replace the #include <stdio.h> line with the contents of <stdio.h>.
Practically, <stdio.h> contains several declarations of various functions (e.g. fprintf(), fscanf()), declarations of variables (e.g. stdout, stdin), and some macro definitions (which, when used in later code, cause text substitution).
The preprocessor is specified as a phase of compilation, which takes source code as input, substitutes text as required (e.g. the #include as I have described, macro expansions, etc), and outputs the resultant source code. That output is what you are directing into sample.i
The output of the preprocessor is then input to a later phase of compilation, which actually understands declarations, definitions, statements, etc.
The phases of compilation are sequential - they occur one after the other, not all at once. So the later phase of compilation feeds no information whatsoever back to the preprocessor. It is the later phase of compilation that detects if declarations etc are used. But, since it cannot feed such information back to the preprocessor (and the preprocessor is an ignorant program that couldn't use such information anyway) the preprocessor cannot know that declarations are unused, and filter them out.
1) You may not use them, but you have included them in line 1
#include <stdio.h>
That's where what you see come from. Try to remove it to see the difference.
2) The preprocessor read your C file and processed all preprocessor directives that you have declared. All Preprocessor directives start with a '#' symbol. The '#include' will replace this line by the content of the given file. You also have the classical '#ifndef' and '#define' directive. The latter is equal to 'if' statement which allow you to activate a part of a code only if a symbol is defined
#ifndef _SOME_SYMBOL_
#define _SOME_SYMBOL_
#ifndef WIN32
#include <some_file.h>
#else
#include <some_other_file.h>
#endif
int main() { return 0;}
#endif //endof _SOME_SYMBOL_
#ifndef _SOME_SYMBOL_
#define _SOME_SYMBOL_
// this second function is ignored
int main() { return 0;}
#endif //endof _SOME_SYMBOL_
When the preprocessor reads the above file, the symbol "_SOME_SYMBOL_" is unknown, so the preprocessor initializes it. Next it includes the file whether or not it knows of WIN32. Usually this kind of symbol is passed trough command line. So part of your code is dynamically activated or deactivated.
The preprocessor will output this
void some_other_function_from_some_other_file(){}
int main() { return 0;}

How to make GCC evaluate functions at compile time?

I am thinking about the following problem: I want to program a microcontroller (let's say an AVR mega type) with a program that uses some sort of look-up tables.
The first attempt would be to locate the table in a separate file and create it using any other scripting language/program/.... In this case there is quite some effort in creating the necessary source files for C.
My thought was now to use the preprocessor and compiler to handle things. I tried to implement this with a table of sine values (just as an example):
#include <avr/io.h>
#include <math.h>
#define S1(i,n) ((uint8_t) sin(M_PI*(i)/n*255))
#define S4(i,n) S1(i,n), S1(i+1,n), S1(i+2,n), S1(i+3,n)
uint8_t lut[] = {S4(0,4)};
void main()
{
uint8_t val, i;
for(i=0; i<4; i++)
{
val = lut[i];
}
}
If I compile this code I get warnings about the sin function. Further in the assembly there is nothing in the section .data. If I just remove the sin in the third line I get the data in the assembly. Clearly all information are available at compile time.
Can you tell me if there is a way to achieve what I intent: The compiler calculates as many values as offline possible? Or is the best way to go using an external script/program/... to calculate the table entries and add these to a separate file that will just be #included?
The general problem here is that sin call makes this initialization de facto illegal, according to rules of C language, as it's not constant expression per se and you're initializing array of static storage duration, which requires that. This also explains why your array is not in .data section.
C11 (N1570) ยง6.6/2,3 Constant expressions (emphasis mine)
A constant expression can be evaluated during translation rather than
runtime, and accordingly may be used in any place that a constant may
be.
Constant expressions shall not contain assignment, increment,
decrement, function-call, or comma operators, except when they are
contained within a subexpression that is not evaluated.115)
However as by #ShafikYaghmour's comment GCC will replace sin function call with its built-in counterpart (unless -fno-builtin option is present), that is likely to be treated as constant expression. According to 6.57 Other Built-in Functions Provided by GCC:
GCC includes built-in versions of many of the functions in the
standard C library. The versions prefixed with __builtin_ are always
treated as having the same meaning as the C library function even if
you specify the -fno-builtin option.
What you are trying is not part of the C language. In situations like this, I have written code following this pattern:
#if GENERATE_SOURCECODE
int main (void)
{
... Code that uses printf to write C code to stdout
}
#else
// Source code generated by the code above
... Here I paste in what the code above generated
// The rest of the program
#endif
Every time you need to change it, you run the code with GENERATE_SOURCECODE defined, and paste in the output. Works well if your code is self contained and the generated output only ever changes if the code generating it changes.
First of all, it should go without saying that you should evaluate (probably by experiment) whether this is worth doing. Your lookup table is going to increase your data size and programmer effort, but may or may not provide a runtime speed increase that you need.
If you still want to do it, I don't think the C preprocessor can do it straightforwardly, because it has no facilities for iteration or recursion.
The most robust way to go about this would be to write a program in C or some other language to print out C source for the table, and then include that file in your program using the preprocessor. If you are using a tool like make, you can create a rule to generate the table file and have your .c file depend on that file.
On the other hand, if you are sure you are never going to change this table, you could write a program to generate it once and just paste it in.

What does #include actually do?

In C (or a language based on C), one can happily use this statement:
#include "hello.h";
And voila, every function and variable in hello.h is automagically usable.
But what does it actually do? I looked through compiler docs and tutorials and spent some time searching online, but the only impression I could form about the magical #include command is that it "copy pastes" the contents of hello.h instead of that line. There's gotta be more than that.
Logically, that copy/paste is exactly what happens. I'm afraid there isn't any more to it. You don't need the ;, though.
Your specific example is covered by the spec, section 6.10.2 Source file inclusion, paragraph 3:
A preprocessing directive of the form
# include "q-char-sequence" new-line
causes the replacement of that directive by the entire contents of the source file identified by the specified sequence between the " delimiters.
That (copy/paste) is exactly what #include "header.h" does.
Note that it will be different for #include <header.h> or when the compiler can't find the file "header.h" and it tries to #include <header.h> instead.
Not really, no. The compiler saves the original file descriptor on a stack and opens the #included file; when it reaches the end of that file, it closes it and pops back to the original file descriptor. That way, it can nest #included files almost arbitrarily.
The # include statement "grabs the attention" of the pre-processor (the process that occurs before your program is actually compiled) and "tells" the pre-processor to include whatever follows the # include statement.
While the pre-processor can be told to do quite a bit, in this instance it's being asked to recognize a header file (which is denoted with a .h following the name of that header, indicating that it's a header).
Now, a header is a file containing C declarations and definitions of functions not explicitly defined in your code. What does this mean? Well, if you want to use a function or define a special type of variable, and you know that these functions/definition are defined elsewhere (say, the standard library), you can just include (# include) the header that you know contains what you need. Otherwise, every time you wanted to use a print function (like in your case), you'd have to recreate the print function.
If its not explicitly defined in your code and you don't #include the header file with the function you're using, your compiler will complain saying something like: "Hey! I don't see where this function is defined, so I don't know what to with this undefined function in your code!".
It's part of the preprocessor. Have a look at http://en.wikipedia.org/wiki/C_preprocessor#Including_files. And yes, it's just copy and paste.
This is a nice link to answer this question.
http://msdn.microsoft.com/en-us/library/36k2cdd4.aspx
Usually #include and #include "path-name" just differs in the order of the search of the pre processor

Resources