C preprocessor library - c

I have a task of developing source analysis tool for C programs, and I need to pre-process code before the analysis itself. I was wondering what is the best library for this. I need something light-weight and portable.

Instead of rolling out your own, why not use cpp that's part of the gcc suite: http://gcc.gnu.org/onlinedocs/gcc-4.6.1/cpp/
CPP(1) GNU CPP(1)
NAME
cpp - The C Preprocessor
SYNOPSIS
cpp [-Dmacro[=defn]...] [-Umacro]
[-Idir...] [-iquotedir...]
[-Wwarn...]
[-M|-MM] [-MG] [-MF filename]
[-MP] [-MQ target...]
[-MT target...]
[-P] [-fno-working-directory]
[-x language] [-std=standard]
infile outfile
Only the most useful options are listed here; see below for the
remainder.
DESCRIPTION
The C preprocessor, often known as cpp, is a macro processor that is
used automatically by the C compiler to transform your program before
compilation. It is called a macro processor because it allows you to
define macros, which are brief abbreviations for longer constructs.

Related

paste operator in macros

I found the following snippet of code .
#define f(g,g2) g##g2
main() {
int var12=100;
printf("%d",f(var,12));
}
I understand that this will translate f(var,12) into var12 .
My question is in the macro definition, why didn't they just write the following :
#define f(g,g2) gg2
why do we need ## to concatenate text, rather than concatenate it ourselves ?
If one writes gg2 the preprocessor will perceive that as a single token. The preprocessor cannot understand that that is the concatenation of g and g2.
#define f(g,g2) g##g2
My opinion is that this is poor unreadable code. It needs at least a comment (giving some motivation, explanation, etc...), and a short name like f is meaningless.
My question is in the macro definition, why didn't they just write the following :
#define f(g,g2) gg2
With such a macro definition, f(x,y) would still be expanded to the token gg2, even if the author wanted the expansion to be xy
Please take time to read e.g. the documentation of GNU cpp (and of your compiler, perhaps GCC) and later some C standard like n1570 or better.
Consider also designing your software by (in some cases) generating C code (inspired by GNU bison, or GNU m4, or GPP). Your build machinery (e.g. your Makefile for GNU make) would process that as you want. In some cases (e.g. programs running for hours of CPU time), you might consider doing some partial evaluation and generating specialized code at runtime (for example, with libgccjit or GNU lightning). Pitrat's book on Artificial Beings, the conscience of a conscious machine explains and arguments that idea in an entire book.
Don't forget to enable all warnings and debug info in your compiler (e.g. with GCC use gcc -Wall -Wextra -g) and learn to use a debugger (like GNU gdb).
On Linux systems I sometimes like to generate some (more or less temporary) C code at runtime (from some kind of abstract syntax tree), then compile that code as a plugin, and dlopen(3) that plugin then dlsym(3) inside it. For a stupid example, see my manydl.c program (to demonstrate that you can generate hundreds of thousands of C files and plugins in the same program). For serious examples, read books.
You might also read books about Common Lisp or about Rust; both have a much richer macro system than C provides.

How preprocessor directives works in C?

I am going through book [Let us C-by Yashwant Kanetkar ], here it stated:
When we compile a program, before the source code passes to the compiler, it is examined by the C preprocessor for any macro definition. When it sees the #define directive, it goes through the entire program in search of macro templates; wherever it finds one, it replaces the macro template with the appropriate macro expansion. Only after this procedure has been completed, is the program handled over to the compiler.
My question is that, before the program is passed to compiler, how can Preprocessor program is able to read the TOKENS corresponding to the macro templates? Is preprocessor program also able to divide the program into TOKENS.
That description is confusing (so I won't recommend that book; read instead the K&R The C Programming Language book). The preprocessor does not go through the entire program, it has previously processed some input. Only past preprocessed input matters for the behavior of the preprocessor (in other words, the preprocessor is a single-pass mechanism).
Read wikipage on C preprocessor, then read documentation of GNU cpp and other documentation on preprocessor, and the wikibook chapter on C programming/Preprocessor.
In current C compilers (for performance reasons) the preprocessor is no longer a separate program, it is part of the compiler itself. For recent GCC look into libcpp/ (its preprocessor library, internal to the compiler).
If using the GCC compiler, you can get the preprocessed form of your source code file csource.c by running gcc -C -E csource.c > csource.i then looking inside the generated preprocessed form csource.i (e.g. with a pager or an editor).
(I strongly recommend doing that once in a while; you'll learn a lot; and yes, you could be surprised by the amount of code pulled by a usual #include <stdio.h> directive)
I believe your book is explaining wrongly. The preprocessor handles every preprocessing directive. When it encounters a #define it stores in some preprocessor symbol table the definition of that symbol. When it encounters after that #define an occurrence of that preprocessor symbol, it does the appropriate substitution.
In book K & R The C Programming Language.
Page No: 88
C provides certain language facilities by means of a preprocessor, which is conceptually a separate first step in compilation.
In book Compiler Principles, Techniques and Tools by Aho, Lam, Sethi and Ullman
Page No. 3
The task of collecting the source program is sometimes entrusted to a separate program, called a preprocessor. The preprocessor may also expand shorthands, called macros, into source language statements. The modified source program is then fed to compiler.
In GCC GNU Documentation
The C preprocessor is a macro processor that is used automatically by the C compiler to transform your program before actual compilation.
Andn read this too.
So from these three official sources, one can say that the Preprocessor is a separate program run by Compiler. So in book Let Us C by Yashwant P Kanetkar that Preprocessor is a program that processes before the compiler as its name suggests is no wrong, and the expanded code can be seen in file.i.
Now let's come to your question,
In book K & R The C Programming Language.
Page No: 89
Substitution are made only for tokens and do not take place within quoted strings.
and as Basile told in his answer that
In current C compilers (for performance reasons) the preprocessor is no longer a separate program, it is part of the compiler itself.
and compiling is a long process that passes through several phases, Preprocessor actually comes after the program is converted in tokens, but as sources says that it is the process of before compilation that means it is done before any kind of intermediate code generation, and yes, breaking program into tokens is the first step of compiler before any intermediate code generation.

Run GCC preprocessor non-C files

I'm using a proprietary development environment that compiles code written in C, as well as the IEC 61131 languages. For the C compilation, it uses GCC 4.1.2 with these build options:
-fPIC -O0 -g -nostartfiles -Wall -trigraphs -fno-asm
The compilation is done by a program running on windows utilizing Cygwin.
My issue is, IEC language preprocessor is not that useful (doesn't support #define at all) and I want to use macros! I don't see why the GCC preprocessor would really care what language it is processing (my target language is Structured Text), so I'm looking to see if anyone might know a way to get it to process files of different file types that then are not compiled further (I'm just looking for macro expansion before the file is run through the IEC compiler). I'm very ignorant of compiler options and environments since I've never had to deal with them, I just write C code and it magically compiles and transfers to my target system to run.
The only things I can really do are add build options and execute a batch file before anything is executed. I think my best hope lies in using a batch file to process all files of a certain extension, but I don't even know what executable in the gnuinst folder to use, let alone what flags to use to run through the files.
Just about any C preprocessor, including gcc's cpp, is going to assume that its input is valid C code. It has to tokenize the input following C (or C++, or Objective-C) rules, because it had to resolve its input into tokens (more precisely preprocessing tokens). Constructs above the token level shouldn't be an issue.
You certainly can use cpp or gcc -E to preprocess text that isn't C source code, but some input constructs will cause problems.
Taking an example from the comments:
$ cat foo.txt
#define ADDTHEM(x, y) ((x) + (y))
ADDTHEM(2, 3)
$ gcc -E - < foo.txt
# 1 "<stdin>"
# 1 "<command-line>"
# 1 "<stdin>"
((2) + (3))
Note that I had to use gcc -E - < foo.txt rather than gcc -E foo.txt, because gcc treats a .txt file as a linker input file by default.
But if you add some content to foo.txt that doesn't consist of valid C preprocessor tokens, you can have problems:
$ cat foo.txt
#define ADDTHEM(x, y) ((x) + (y))
ADDTHEM(2, 3)
ADDTHEM('c, "s)
$ gcc -E - < foo.txt
# 1 "<stdin>"
# 1 "<command-line>"
# 1 "<stdin>"
((2) + (3))
<stdin>:3:9: warning: missing terminating ' character [enabled by default]
<stdin>:3:0: error: unterminated argument list invoking macro "ADDTHEM"
ADDTHEM
(Attempts to feed Ada source code to a C preprocessor have run into this kind of problem, since Ada uses isolated apostrophe ' characters for its attribute syntax.)
So you can do it if the input language doesn't use things that aren't valid C preprocessor tokens.
See the N1570 draft of the C standard, section 6.4, for more information about preprocessing tokens.
I actually wrote the above before I checked the GNU cpp manual, which says:
The C preprocessor is intended to be used only with C, C++, and
Objective-C source code. In the past, it has been abused as a general
text processor. It will choke on input which does not obey C's lexical
rules. For example, apostrophes will be interpreted as the beginning of
character constants, and cause errors. Also, you cannot rely on it
preserving characteristics of the input which are not significant to
C-family languages. If a Makefile is preprocessed, all the hard tabs
will be removed, and the Makefile will not work.
Having said that, you can often get away with using cpp on things
which are not C. Other Algol-ish programming languages are often safe
(Pascal, Ada, etc.) So is assembly, with caution. `-traditional-cpp'
mode preserves more white space, and is otherwise more permissive. Many
of the problems can be avoided by writing C or C++ style comments
instead of native language comments, and keeping macros simple.
Wherever possible, you should use a preprocessor geared to the
language you are writing in. Modern versions of the GNU assembler have
macro facilities. Most high level programming languages have their own
conditional compilation and inclusion mechanism. If all else fails,
try a true general text processor, such as GNU M4.
(The authors of that manual apparently missed the problem with Ada's attribute syntax.)

Is there any C preprocessor as an independent program?

I know that C preprocessor exists as part of compiler. But I'm looking for an independent program. Is there any such tool?
It's often called cpp. For example, on my Linux box:
CPP(1) GNU CPP(1)
NAME
cpp - The C Preprocessor
SYNOPSIS
cpp [-Dmacro[=defn]...] [-Umacro]
[-Idir...] [-iquotedir...]
[-Wwarn...]
[-M|-MM] [-MG] [-MF filename]
[-MP] [-MQ target...]
[-MT target...]
[-P] [-fno-working-directory]
[-x language] [-std=standard]
infile outfile
This particular one is part of gcc and is available for a wide variety of platforms.
mcpp.
From the homepage:
mcpp is a C/C++ preprocessor with the following features.
Implements all of C90, C99 and C++98 specifications.
Provides a validation suite to test C/C++ preprocessor's conformance and quality comprehensively. When this validation suite is applied, mcpp distinguishes itself among many existing preprocessors.
Has plentiful and on-target diagnostics to check all the preprocessing problems such as latent bug or lack of portability in source code.
Has #pragma directives to output debugging information.
Is portable and has been ported to many compiler-systems, including GCC and Visual C++, on UNIX-like systems and Windows.
Has various behavior modes.
Can be built either as a compiler-specific preprocessor to replace the resident preprocessor of a particular compiler system, or as a compiler-independent command, or >even as a subroutine called from some other main program.
Provides comprehensive documents both in Japanese and in English.
Is an open source software released under BSD-style-license.
You can also have a look at m4
What is m4?
M4 can be called a “template language”, a “macro language” or a “preprocessor language”. The name “m4” also refers to the program which processes texts in this language: this “preprocessor” or “macro processor” takes as input an m4 template and sends this to the output, after acting on any embedded directives, called macros.
I've used filepp for preprocessing files other than straight C. It's a Perl module, so it's pretty portable. It's handy in that you can use all the familiar idioms you are used to, and adds some useful features.
From the web site:
Why filepp and not plain old cpp?
cpp is designed specifically to
generate output for the C compiler.
Yes, you can use any file type with
it, but the output it creates includes
loads of blank lines and lines of the
style:
# 1 "file.c"
Obviously these lines are very useful
to the C-compiler, but no use in say
an HTML file. Also, as filepp is
written in Perl, it is 8-bit clean and
so works on any character set, not
just ASCII characters. filepp is also
customisable and hopefully more user
friendly than cpp.
cpp is just one. It's a separated program called by gcc when compiling.
It is a part of the package, and usually called cpp (C PreProcessor).
which cpp
# /usr/bin/cpp
man cpp

Using the C Preprocessor for languages other than C

The Wikipedia entry for the C Preprocessor states:
The language of preprocessor
directives is agnostic to the grammar
of C, so the C preprocessor can also
be used independently to process other
types of files.
How can this be done? Any examples or techniques?
EDIT: Yes, I'm mostly interested in macro processing. Even though it's probably not advisable or maintainable it would still be useful to know what's possible.
You can call CPP directly:
cpp <file>
Rather than calling it through gcc:
gcc -E filename
Do note however that, as mentioned in the same Wikipedia article, C preprocessor's language is not really equipped for general-purpose use:
However, since the C preprocessor does not have features of some other
preprocessors, such as recursive macros, selective expansion according
to quoting, string evaluation in conditionals, and Turing
completeness, it is very limited in comparison to a more general macro
processor such as m4.
Have you considered dabbling with a more flexible macro processing language, like the aforementioned m4 for instance?
For example, Assembler. While many assemblers have their own way to #include headers and #define macros, it can be useful to use the C preprocessor for this. GNU make, for example, has implicit rules for turning *.S files into *.s files by running the preprocessor ('cpp'), before feeding the *.s file to the GNU assembler ('as').
Yes, it can be done by parsing your own language through the gcc preprocessor (e.g. 'gcc -E').
We have done this on my job with our our, specific language. It has quite some advantages:
You can use C's include statements (#include) which is very powerful
You can use your #ifdef constructions
You can define Constants (#define MAGIC_NUMBER 42) or macro functions (#define min(x,y) ( (x( < (y) ? (x) : (y))
... and the other things in the c processor.
HOWEVER, you also inherit the unsafe C constructions, and having a preprocessor not integrated with your main language is the cause of it. Think about the minimum macro and doing something like :
a = 2;
b = 3;
c = min(a--, b--);
Just think what value a and b will have after the min function?
Same is true about the non-typed constants that you introduce
See the Safer C book for details.
Many C compilers have a flag that tells them to only preprocess. With gcc it's the -E flag. eg:
$ gcc -E -
#define FOO foo
bar FOO baz
will output:
# 1 "<stdin>"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "<stdin>"
bar foo baz
With other C compilers you'll have to check the manuals to see how to swithc to preprocess-only mode.
Usually you can invoke the C compiler with an option to preprocess only (and ignore any #line statements). Take this as a simple example:
<?php
function foo()
{
#ifdef DEBUG
echo "Some debug info.";
#endif
echo "Foo!";
}
foo();
We define a PHP source file with preprocess statements. We can then preprocess it (gcc can do this, too):
cl -nologo -EP foo.php > foo2.php
Since DEBUG is not the defined the first echo is stripped. Plus here is that lines beginning with # are comments in PHP so you don't have to preprocess them for a "debug" build.
Edit: Since you asked about macros. This works fine too and could be used to generate boilerplate code etc.
Using Microsoft's compiler, I think (I just looked it up, haven't tested it) that it's the /P compiler option.
Other compilers presumably have similar options (or, for some compilers the preprocessor might actually be a different executable, which is usually run implicitly by the compiler but which you can also run explicitly separately).
Assuming you're using GCC, You can take any plain old text file, regardless of its contents, and run:
gcc -E filename
Any preprocessor directives in the file will be processed by the preprocessor and GCC will then exit.
The point is that it doesn't matter what the actual content of the text file is, since all the preprocessor cares about is its own directives.
I have heard of people using the C pre-processor on Ada code. Ada has no preprocessor, so you have to do something like that if you want to preprocess your code.
However, it was a concious design decision not to give it one, so doing this is very un-Ada. I wouldn't suggest anyone do this.
A while ago I did some work on a project that used imake for makefile generation. As I recall, it was basically the c preprocessor syntax to generate the make files.
The C preprocessor can also be invoked by the Glasgow Haskell Compiler (GHC) prior to compiling Haskell code, by passing the -cpp flag.
You could implement the C preprocessor in the compiler for another language.
You could use it to preprocess any sort of text file, but there's much better things for that purpose.
Basically what it's saying is that preprocessors have nothing to do with C syntax. They are basically simple parsers that follow a set of rules. So you could use preprocessors kind of like you'd use sed or awk for some silly tasks. Don't ask me why you'd ever want to do it though.
For example, on a text file:
#define pi 3.141
pi is not an irrational number.
Then you run the preprocessor & you'd get.
3.141 is not an irrational number.

Resources