How preprocessor directives works in C? - c

I am going through book [Let us C-by Yashwant Kanetkar ], here it stated:
When we compile a program, before the source code passes to the compiler, it is examined by the C preprocessor for any macro definition. When it sees the #define directive, it goes through the entire program in search of macro templates; wherever it finds one, it replaces the macro template with the appropriate macro expansion. Only after this procedure has been completed, is the program handled over to the compiler.
My question is that, before the program is passed to compiler, how can Preprocessor program is able to read the TOKENS corresponding to the macro templates? Is preprocessor program also able to divide the program into TOKENS.

That description is confusing (so I won't recommend that book; read instead the K&R The C Programming Language book). The preprocessor does not go through the entire program, it has previously processed some input. Only past preprocessed input matters for the behavior of the preprocessor (in other words, the preprocessor is a single-pass mechanism).
Read wikipage on C preprocessor, then read documentation of GNU cpp and other documentation on preprocessor, and the wikibook chapter on C programming/Preprocessor.
In current C compilers (for performance reasons) the preprocessor is no longer a separate program, it is part of the compiler itself. For recent GCC look into libcpp/ (its preprocessor library, internal to the compiler).
If using the GCC compiler, you can get the preprocessed form of your source code file csource.c by running gcc -C -E csource.c > csource.i then looking inside the generated preprocessed form csource.i (e.g. with a pager or an editor).
(I strongly recommend doing that once in a while; you'll learn a lot; and yes, you could be surprised by the amount of code pulled by a usual #include <stdio.h> directive)
I believe your book is explaining wrongly. The preprocessor handles every preprocessing directive. When it encounters a #define it stores in some preprocessor symbol table the definition of that symbol. When it encounters after that #define an occurrence of that preprocessor symbol, it does the appropriate substitution.

In book K & R The C Programming Language.
Page No: 88
C provides certain language facilities by means of a preprocessor, which is conceptually a separate first step in compilation.
In book Compiler Principles, Techniques and Tools by Aho, Lam, Sethi and Ullman
Page No. 3
The task of collecting the source program is sometimes entrusted to a separate program, called a preprocessor. The preprocessor may also expand shorthands, called macros, into source language statements. The modified source program is then fed to compiler.
In GCC GNU Documentation
The C preprocessor is a macro processor that is used automatically by the C compiler to transform your program before actual compilation.
Andn read this too.
So from these three official sources, one can say that the Preprocessor is a separate program run by Compiler. So in book Let Us C by Yashwant P Kanetkar that Preprocessor is a program that processes before the compiler as its name suggests is no wrong, and the expanded code can be seen in file.i.
Now let's come to your question,
In book K & R The C Programming Language.
Page No: 89
Substitution are made only for tokens and do not take place within quoted strings.
and as Basile told in his answer that
In current C compilers (for performance reasons) the preprocessor is no longer a separate program, it is part of the compiler itself.
and compiling is a long process that passes through several phases, Preprocessor actually comes after the program is converted in tokens, but as sources says that it is the process of before compilation that means it is done before any kind of intermediate code generation, and yes, breaking program into tokens is the first step of compiler before any intermediate code generation.

Related

paste operator in macros

I found the following snippet of code .
#define f(g,g2) g##g2
main() {
int var12=100;
printf("%d",f(var,12));
}
I understand that this will translate f(var,12) into var12 .
My question is in the macro definition, why didn't they just write the following :
#define f(g,g2) gg2
why do we need ## to concatenate text, rather than concatenate it ourselves ?
If one writes gg2 the preprocessor will perceive that as a single token. The preprocessor cannot understand that that is the concatenation of g and g2.
#define f(g,g2) g##g2
My opinion is that this is poor unreadable code. It needs at least a comment (giving some motivation, explanation, etc...), and a short name like f is meaningless.
My question is in the macro definition, why didn't they just write the following :
#define f(g,g2) gg2
With such a macro definition, f(x,y) would still be expanded to the token gg2, even if the author wanted the expansion to be xy
Please take time to read e.g. the documentation of GNU cpp (and of your compiler, perhaps GCC) and later some C standard like n1570 or better.
Consider also designing your software by (in some cases) generating C code (inspired by GNU bison, or GNU m4, or GPP). Your build machinery (e.g. your Makefile for GNU make) would process that as you want. In some cases (e.g. programs running for hours of CPU time), you might consider doing some partial evaluation and generating specialized code at runtime (for example, with libgccjit or GNU lightning). Pitrat's book on Artificial Beings, the conscience of a conscious machine explains and arguments that idea in an entire book.
Don't forget to enable all warnings and debug info in your compiler (e.g. with GCC use gcc -Wall -Wextra -g) and learn to use a debugger (like GNU gdb).
On Linux systems I sometimes like to generate some (more or less temporary) C code at runtime (from some kind of abstract syntax tree), then compile that code as a plugin, and dlopen(3) that plugin then dlsym(3) inside it. For a stupid example, see my manydl.c program (to demonstrate that you can generate hundreds of thousands of C files and plugins in the same program). For serious examples, read books.
You might also read books about Common Lisp or about Rust; both have a much richer macro system than C provides.

Do C preprocessing directives belong to the C programming language?

Do C preprocessing directives belong to the C programming language?
I think they don't because they are processed by a C preprocessor instead of an actual C compiler.
Thanks.
Yes, but only insofar as they are discussed in section 6.10 of the C99 (or later) C standard. The standard is (likely intentionally) vague about the preprocessor, only discussing things that it should do, not defining a list of things that it may or must do.
Could you create a compiler for standard C that does not have a preprocessor? Certainly, though it would be very inconvenient to use libraries.
The C Standard precisely defines the behaviour of the preprocessing phase. So it is definitely a part of the C language.
It's normal for implementations to deliver separate binaries for preprocessing, compiling and linking. The standard is written in such a way that each translation phase could be performed by a separate executable. But it's not a requirement. In fact most compilers also allow all of those things to be done via a single command (e.g. gcc foo.c bar.c).

Preprocessor-like substitution into a parser

I am making a parser currently which aims to be able to input data in a program.
The syntax used is greatly inspired from C.
I would enjoy to reproduct a kind of preprocessor inline substitution into it.
for example
#define HELLO ((variable1 + variable2 + variable3))
int variable1 = 37;
int variable2 = 82;
int variable3 = 928;
Thing is... I'm actually using C. I'm also using standard functions from stdio.h to parse through my files.
So... what techniques I could use to make this work correctly and efficiently?
Does the standard compilers substitute the text by re-copying the stream buffer and making the substitution there as the re-copying occurs or what? Is there more efficient techniques?
I guess we say preprocessor because it first substitutes everything until theres no preproc directives (recursive approach maybe?), and then, it starts doing the real compile job?
Excuse my lack of knowledge!
Thanks!
No, modern C compilers don't implement the preprocessor as a text processor, but they have the different compiler phases (preprocessing being one of them) tangled. This is particularly important for the efficiency of the compiler itself and to be able to track errors back into the original source code.
Also implementing a preprocessor by yourself is a tedious task. Think twice before you start such a project.
Yes, you are right about preprocessors. It has the job of bringing together all files which are requires for the execution of the program to 1 file for eg. stdio.h. Then it allows the compiler to compile the program. The file you want to compile is given as argument to the compiler and the techniques used by the compiler may vary according to the os and the compiler itself
The C preprocessor works on tokens not text. In particular, macro expansion cannot contain preprocessor directives. Other preprocessors, such as m4, work differently.

Is there any C preprocessor as an independent program?

I know that C preprocessor exists as part of compiler. But I'm looking for an independent program. Is there any such tool?
It's often called cpp. For example, on my Linux box:
CPP(1) GNU CPP(1)
NAME
cpp - The C Preprocessor
SYNOPSIS
cpp [-Dmacro[=defn]...] [-Umacro]
[-Idir...] [-iquotedir...]
[-Wwarn...]
[-M|-MM] [-MG] [-MF filename]
[-MP] [-MQ target...]
[-MT target...]
[-P] [-fno-working-directory]
[-x language] [-std=standard]
infile outfile
This particular one is part of gcc and is available for a wide variety of platforms.
mcpp.
From the homepage:
mcpp is a C/C++ preprocessor with the following features.
Implements all of C90, C99 and C++98 specifications.
Provides a validation suite to test C/C++ preprocessor's conformance and quality comprehensively. When this validation suite is applied, mcpp distinguishes itself among many existing preprocessors.
Has plentiful and on-target diagnostics to check all the preprocessing problems such as latent bug or lack of portability in source code.
Has #pragma directives to output debugging information.
Is portable and has been ported to many compiler-systems, including GCC and Visual C++, on UNIX-like systems and Windows.
Has various behavior modes.
Can be built either as a compiler-specific preprocessor to replace the resident preprocessor of a particular compiler system, or as a compiler-independent command, or >even as a subroutine called from some other main program.
Provides comprehensive documents both in Japanese and in English.
Is an open source software released under BSD-style-license.
You can also have a look at m4
What is m4?
M4 can be called a “template language”, a “macro language” or a “preprocessor language”. The name “m4” also refers to the program which processes texts in this language: this “preprocessor” or “macro processor” takes as input an m4 template and sends this to the output, after acting on any embedded directives, called macros.
I've used filepp for preprocessing files other than straight C. It's a Perl module, so it's pretty portable. It's handy in that you can use all the familiar idioms you are used to, and adds some useful features.
From the web site:
Why filepp and not plain old cpp?
cpp is designed specifically to
generate output for the C compiler.
Yes, you can use any file type with
it, but the output it creates includes
loads of blank lines and lines of the
style:
# 1 "file.c"
Obviously these lines are very useful
to the C-compiler, but no use in say
an HTML file. Also, as filepp is
written in Perl, it is 8-bit clean and
so works on any character set, not
just ASCII characters. filepp is also
customisable and hopefully more user
friendly than cpp.
cpp is just one. It's a separated program called by gcc when compiling.
It is a part of the package, and usually called cpp (C PreProcessor).
which cpp
# /usr/bin/cpp
man cpp

What C preprocessor macros have already been defined in gcc?

In gcc, how can I check what C preprocessor definitions are in place during the compilation of a C program, in particular what standard or platform-specific macro definitions are defined?
Predefined macros depend on the standard and the way the compiler implements it.
For GCC: http://gcc.gnu.org/onlinedocs/cpp/Predefined-Macros.html
For Microsoft Visual Studio 8: http://msdn.microsoft.com/en-us/library/b0084kay(VS.80).aspx
This Wikipedia page http://en.wikipedia.org/wiki/C_preprocessor#Compiler-specific_predefined_macros lists how to dump at some of the predefined macros
A likely source of the predefined macros for a specific combination of compiler and platform is the Predef project at Sourceforge. They are attempting to maintain a catalog of all predefined macros in all C and C++ compilers on all platforms. In practice, they have coverage of a fair number of platforms for GCC, and a smattering of other compilers.
They achieved this through a combination of careful reading of documentation, as well as a shell script that figures out what macros are predefined the hard way: it tries them. My understanding is that it actually tries every string it can find in the executable image of the compiler and/or preprocessor to see if it has a predefined meaning.
They will happily add any info they don't have yet to their database.
A program may define a macro at one
point, remove that definition later,
and then provide a different
definition after that. Thus, at
different points in the program, a
macro may have different definitions,
or have no definition at all.

Resources