Run GCC preprocessor non-C files - c

I'm using a proprietary development environment that compiles code written in C, as well as the IEC 61131 languages. For the C compilation, it uses GCC 4.1.2 with these build options:
-fPIC -O0 -g -nostartfiles -Wall -trigraphs -fno-asm
The compilation is done by a program running on windows utilizing Cygwin.
My issue is, IEC language preprocessor is not that useful (doesn't support #define at all) and I want to use macros! I don't see why the GCC preprocessor would really care what language it is processing (my target language is Structured Text), so I'm looking to see if anyone might know a way to get it to process files of different file types that then are not compiled further (I'm just looking for macro expansion before the file is run through the IEC compiler). I'm very ignorant of compiler options and environments since I've never had to deal with them, I just write C code and it magically compiles and transfers to my target system to run.
The only things I can really do are add build options and execute a batch file before anything is executed. I think my best hope lies in using a batch file to process all files of a certain extension, but I don't even know what executable in the gnuinst folder to use, let alone what flags to use to run through the files.

Just about any C preprocessor, including gcc's cpp, is going to assume that its input is valid C code. It has to tokenize the input following C (or C++, or Objective-C) rules, because it had to resolve its input into tokens (more precisely preprocessing tokens). Constructs above the token level shouldn't be an issue.
You certainly can use cpp or gcc -E to preprocess text that isn't C source code, but some input constructs will cause problems.
Taking an example from the comments:
$ cat foo.txt
#define ADDTHEM(x, y) ((x) + (y))
ADDTHEM(2, 3)
$ gcc -E - < foo.txt
# 1 "<stdin>"
# 1 "<command-line>"
# 1 "<stdin>"
((2) + (3))
Note that I had to use gcc -E - < foo.txt rather than gcc -E foo.txt, because gcc treats a .txt file as a linker input file by default.
But if you add some content to foo.txt that doesn't consist of valid C preprocessor tokens, you can have problems:
$ cat foo.txt
#define ADDTHEM(x, y) ((x) + (y))
ADDTHEM(2, 3)
ADDTHEM('c, "s)
$ gcc -E - < foo.txt
# 1 "<stdin>"
# 1 "<command-line>"
# 1 "<stdin>"
((2) + (3))
<stdin>:3:9: warning: missing terminating ' character [enabled by default]
<stdin>:3:0: error: unterminated argument list invoking macro "ADDTHEM"
ADDTHEM
(Attempts to feed Ada source code to a C preprocessor have run into this kind of problem, since Ada uses isolated apostrophe ' characters for its attribute syntax.)
So you can do it if the input language doesn't use things that aren't valid C preprocessor tokens.
See the N1570 draft of the C standard, section 6.4, for more information about preprocessing tokens.
I actually wrote the above before I checked the GNU cpp manual, which says:
The C preprocessor is intended to be used only with C, C++, and
Objective-C source code. In the past, it has been abused as a general
text processor. It will choke on input which does not obey C's lexical
rules. For example, apostrophes will be interpreted as the beginning of
character constants, and cause errors. Also, you cannot rely on it
preserving characteristics of the input which are not significant to
C-family languages. If a Makefile is preprocessed, all the hard tabs
will be removed, and the Makefile will not work.
Having said that, you can often get away with using cpp on things
which are not C. Other Algol-ish programming languages are often safe
(Pascal, Ada, etc.) So is assembly, with caution. `-traditional-cpp'
mode preserves more white space, and is otherwise more permissive. Many
of the problems can be avoided by writing C or C++ style comments
instead of native language comments, and keeping macros simple.
Wherever possible, you should use a preprocessor geared to the
language you are writing in. Modern versions of the GNU assembler have
macro facilities. Most high level programming languages have their own
conditional compilation and inclusion mechanism. If all else fails,
try a true general text processor, such as GNU M4.
(The authors of that manual apparently missed the problem with Ada's attribute syntax.)

Related

Elegantly adding m4 macro processor into gcc compilation chain? [duplicate]

Could you please give me an example of writing a custom gcc preprocessor?
My goal is to replace SID("foo") alike macros with appropriate CRC32 computed values. For any other macro I'd like to use the standard cpp preprocessor.
It looks like it's possible to achieve this goal using -no-integrated-cpp -B options, however I can't find any simple example of their usage.
Warning: dangerous and ugly hack. Close your eyes now You can hook your own preprocessor by adding the '-no-integrated-cpp' and '-B' switches to the gcc command line. '-no-integrated-cpp' means that gcc does search in the '-B' path for its preprocessors before it uses its internal search path. The invocations of the preprocessor can be identified if the 'cc1', 'cc1plus' or 'cc1obj' programs (these are the C, C++ and Objective-c compilers) are invoked with the '-E' option. You can do your own preprocessing when you see this option. When there is no '-E' option pass all the parameters to the original programs. When there is such an option, you can do your own preprocessing, and pass the manipulated file to the original compiler.
It looks like this:
> cat cc1
#!/bin/sh
echo "My own special preprocessor -- $#"
/usr/lib/gcc/i486-linux-gnu/4.3/cc1 $#
exit $?
> chmod 755 cc1
> gcc -no-integrated-cpp -B$PWD x.c
My own special preprocessor -- -E -quiet x.c -mtune=generic -o /tmp/cc68tIbc.i
My own special preprocessor -- -fpreprocessed /tmp/cc68tIbc.i -quiet -dumpbase x.c -mtune=generic -auxbase x -o /tmp/cc0WGHdh.s
This example calls the original preprocessor, but prints an additional message and the parameters. You can replace the script by your own preprocessor.
The bad hack is over. You can open your eyes now.
One way is to use a program transformation system, to "rewrite" just the SID macro invocation to what you want before you do the compilation, leaving the rest of the preprocessor handling to the compiler itself.
Our DMS Software Reengineering Toolkit is a such a system, that can be applied to many languages including C and specifically the GCC 2/3/4 series of compilers.
To implement this idea using DMS, you would run DMS with its C front end
over your source code before the compilation step. DMS can parse the code without expanding the preprocessor directives, build
abstract syntax trees representing it, carry out transformations on the ASTs, and then spit out result as compilable C text.
The specific transformation rule you would use is:
rule replace_SID_invocation(s:STRING):expression->expression
= "SID(\s)" -> ComputeCRC32(s);
where ComputeCRC32 is custom code that does what it says. (DMS includes a CRC32 implementation, so the custom code for this is pretty short.
DMS is kind a a big hammer for this task. You could use PERL to implement something pretty similar. The difference with PERL (or some other string match/replace hack) is the risk that a) it might find the pattern someplace where you don't want a replacement, e.g.
... QSID("foo")... // this isn't a SID invocation
which you can probably fix by coding your pattern match carefully, b) fail to match a SID call found in suprising circumstances:
... SID ( /* master login id */ "Joel" ) ... // need to account for formatting and whitespace
and c) fail to handle the various kinds of escape characters that show up in the literal string itself:
... SID("f\no\072") ... // need to handle all of GCC's weird escapes
DMS's C front end handles all the escapes for you; the ComputeCRC32 function above would see the string containing the actual intended characters, not the raw text you see in the source code.
So its really a matter of whether you care about the dark-corner cases, or if you think you may have more special processing to do.
Given the way you've described the problem, I'd be sorely tempted to go the Perl route first and simply outlaw the funny cases. If you can't do this, then the big hammer makes sense.

Cpp : How to understand and/or debug complex macros?

I am trying to learn preprocessor tricks that I found not so easy (Can we have recursive macros?, Is there a way to use C++ preprocessor stringification on variadic macro arguments?, C++ preprocessor __VA_ARGS__ number of arguments, Variadic macro trick, ...). I know the -E option to see the result of the preprocessor whole pass but I would like to know, if options or means exist to see the result step by step. Indeed, sometimes it is difficult to follow what happens when a macro calls a macro that calls a macro ... with the mechanism of disabling context, painting blue ... In brief, I wonder if a sort of preprocessor debugger with breakpoints and other tools exists.
(Do not answer that this use of preprocessor directives is dangerous, ugly, horrible, not good practices in C, produces unreadable code ... I am aware of that and it is not the question).
Yes, this tool exists as a feature of Eclipse IDE. I think the default way to access the feature is to hover over a macro you want to see expanded (this will show the full expansion) and then press F2 on your keyboard (a popup appears that allows you to step through each expansion).
When I used this tool to learn more about macros it was very helpful. With just a little practice, you won't need it anymore.
In case anyone is confused about how to use this feature, I found a tutorial on the Eclipse documentation here.
This answer to another question is relevant.
When you do weird preprocessor tricks (which are legitimate) it is useful to ask the compiler to generate the preprocessed form (e.g. with gcc -C -E if using GCC) and look into that preprocessed form.
In practice, for a source file foo.c it makes (sometimes) sense to get its preprocessed form foo.i with gcc -C -E foo.c > foo.i and look into that foo.i.
Sometimes, it even makes sense to get that foo.i without line information. The trick here (removing line information contained in lines starting with #) would be to do:
gcc -C -E foo.c | grep -v '^#' > foo.i
Then you could indent foo.i and compile it, e.g. with gcc -Wall -c foo.i; you'll get error locations in the preprocessed file and you could understand how you got that and go back to your preprocessor macros (or their invocations).
Remember that the C preprocessor is mostly a textual transformation working at the file level. It is not possible to macro-expand a few lines in isolation (because prior lines might have played with #if combined with #define -perhaps in prior #include-d files- or preprocessor options such as -DNDEBUG passed to gcc or g++). On Linux see also feature_test_macros(7)
A known example of expansion which works differently when compiled with or without -DNDEBUG passed to the compiler is assert. The meaning of assert(i++ > 0) (a very wrong thing to code) depends on it and illustrates that macro-expansion cannot be done locally (and you might imagine some prior header having #define NDEBUG 1 even if of course it is poor taste).
Another example (very common actually) where the macro expansion is context dependent is any macro using __LINE__ or __COUNTER__
...
NB. You don't need Eclipse for all that, just a good enough source code editor (my preference is emacs but that is a matter of taste): for the preprocessing task you can use your compiler.
The only way to see what is wrong with your macro is to add the option which will keep the temporary files when compilation completes. For gcc it is -save-temps option. You can open the .i file and the the expanded macros.
IDE indexers (like Eclipse) will not help too much. They will not expand (as other answer states) the macros until the error occures.

Which file is generated after preprocessing of a C program?

For a C program just before compilation, i.e. after the pre-processing has been completed which file(what extension) is generated?
It is compiler dependent. Most compilers by default don't generate intermediate pre-processor files.
With gcc, if you add -save-temps option to get the intermediate files, the output of the pre-processor is dumped in a .i file. With -E option (to perform only the pre-processing), without -o to specify the output file, the result is dumped to stdout.
In most current compilers (e.g. GCC or Clang/LLVM) - and for performance reasons - the C/C++ preprocessor is an internal part of the compiler (in GCC it is libcpp/ and is a library ...), so no preprocessed form is output into a file.
In the very first C or proto-C compilers (1970s PDP-8) the memory was so small (64kilobytes!) that such an organization was not possible, and the preprocessor was a separate program /lib/cpp
Today, our laptops have several gigabytes of memory, which is usually much larger than the preprocessed form (of the largest source file you'll feed to your compiler). So current compilers keep some internal representation of the whole translation unit and are able to optimize it entirely (inter-procedural optimizations, including inlining).
All compilers keep several forms of the abstract syntax tree (AST); the bulk of the work of a compiler is not parsing or code generation, but transforming some internal representation of the AST into another internal representation (itself further transformed). In GCC most of the optimizations are working on the GIMPLE form. You can extend the compiler by adding your own optimization passes, e.g. with your GCC plugin.
In turn, this technological evolution has fertilized the (evolution of) the definition of our programming languages, recent C++11 is designed for a very optimizing compiler. The recent style guiding or coding hints around C++11 are presupposing (and makes sense only because of) very powerful optimizations.
You still can usually invoke the compiler to spit the preprocessed form, e.g. with gcc -C -E source.c > source.i, in a seperate file (conventionally suffixed .i or .ii, and such suffixes can be known to builder like make)
Journey of a C Program to Linux Executable in 4 Stages :
Pre-processing
Compilation
Assembly
Linking
Check this link for more details C program compilation process
>gcc -E fname.c >fname.x /fname.x is the pre-processed output to which u r saving/
The following four things happen in the pre-processing stage.
1> header file inclusion
2>comment removal
3>macro substitution (eg if u have #fenine NUM 10, where-ever in code you have used NUM ll be replaced by 10)
4> conditional compilation (eg
#if 0
...
some code
...
#endif
since "#if 0" evaluates to 0, the code under it never executes. Therefore code under it is not included in your pre-processed output

Listing all the #defines in a C program

Is it possible to get the list of #defines(both compile time and defined in the source code) used in a C program while execution.
Because i am having a project having lot of C source files.
Is there any compile time option to get that?
GNU cpp takes various -d options to output macro and define data. See their man pages for more details.
for gcc, you can use one of the following:
-dCHARS CHARS is a sequence of one or more of the following characters, and must not be preceded by a space. Other characters are interpreted by the compiler proper, or reserved for future versions of GCC, and so are silently ignored. If you specify characters whose behavior conflicts, the result is undefined.
M'
Instead of the normal output, generate a list of#define' directives for all the macros defined during the execution of the preprocessor, including predefined macros. This gives you a way of finding out what is predefined in your version of the preprocessor. Assuming you have no file foo.h, the command
touch foo.h; cpp -dM foo.h
will show all the predefined macros.
If you use -dM without the -E option, -dM is interpreted as a synonym for -fdump-rtl-mach. See Debugging Options.
D'
LikeM' except in two respects: it does not include the predefined macros, and it outputs both the #define' directives and the result of preprocessing. Both kinds of output go to the standard output file.
N'
Like `D', but emit only the macro names, not their expansions.
I'
Output#include' directives in addition to the result of preprocessing.
U'
LikeD' except that only macros that are expanded, or whose definedness is tested in preprocessor directives, are output; the output is delayed until the use or test of the macro; and `#undef' directives are also output for macros tested but undefined at the time.
In gcc the command you probably want is
gcc -dM -E [your_source_files]
I know this is implicitly in the above answers, but perhaps someone needs (like myself) the quick recipe.

Using the C Preprocessor for languages other than C

The Wikipedia entry for the C Preprocessor states:
The language of preprocessor
directives is agnostic to the grammar
of C, so the C preprocessor can also
be used independently to process other
types of files.
How can this be done? Any examples or techniques?
EDIT: Yes, I'm mostly interested in macro processing. Even though it's probably not advisable or maintainable it would still be useful to know what's possible.
You can call CPP directly:
cpp <file>
Rather than calling it through gcc:
gcc -E filename
Do note however that, as mentioned in the same Wikipedia article, C preprocessor's language is not really equipped for general-purpose use:
However, since the C preprocessor does not have features of some other
preprocessors, such as recursive macros, selective expansion according
to quoting, string evaluation in conditionals, and Turing
completeness, it is very limited in comparison to a more general macro
processor such as m4.
Have you considered dabbling with a more flexible macro processing language, like the aforementioned m4 for instance?
For example, Assembler. While many assemblers have their own way to #include headers and #define macros, it can be useful to use the C preprocessor for this. GNU make, for example, has implicit rules for turning *.S files into *.s files by running the preprocessor ('cpp'), before feeding the *.s file to the GNU assembler ('as').
Yes, it can be done by parsing your own language through the gcc preprocessor (e.g. 'gcc -E').
We have done this on my job with our our, specific language. It has quite some advantages:
You can use C's include statements (#include) which is very powerful
You can use your #ifdef constructions
You can define Constants (#define MAGIC_NUMBER 42) or macro functions (#define min(x,y) ( (x( < (y) ? (x) : (y))
... and the other things in the c processor.
HOWEVER, you also inherit the unsafe C constructions, and having a preprocessor not integrated with your main language is the cause of it. Think about the minimum macro and doing something like :
a = 2;
b = 3;
c = min(a--, b--);
Just think what value a and b will have after the min function?
Same is true about the non-typed constants that you introduce
See the Safer C book for details.
Many C compilers have a flag that tells them to only preprocess. With gcc it's the -E flag. eg:
$ gcc -E -
#define FOO foo
bar FOO baz
will output:
# 1 "<stdin>"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "<stdin>"
bar foo baz
With other C compilers you'll have to check the manuals to see how to swithc to preprocess-only mode.
Usually you can invoke the C compiler with an option to preprocess only (and ignore any #line statements). Take this as a simple example:
<?php
function foo()
{
#ifdef DEBUG
echo "Some debug info.";
#endif
echo "Foo!";
}
foo();
We define a PHP source file with preprocess statements. We can then preprocess it (gcc can do this, too):
cl -nologo -EP foo.php > foo2.php
Since DEBUG is not the defined the first echo is stripped. Plus here is that lines beginning with # are comments in PHP so you don't have to preprocess them for a "debug" build.
Edit: Since you asked about macros. This works fine too and could be used to generate boilerplate code etc.
Using Microsoft's compiler, I think (I just looked it up, haven't tested it) that it's the /P compiler option.
Other compilers presumably have similar options (or, for some compilers the preprocessor might actually be a different executable, which is usually run implicitly by the compiler but which you can also run explicitly separately).
Assuming you're using GCC, You can take any plain old text file, regardless of its contents, and run:
gcc -E filename
Any preprocessor directives in the file will be processed by the preprocessor and GCC will then exit.
The point is that it doesn't matter what the actual content of the text file is, since all the preprocessor cares about is its own directives.
I have heard of people using the C pre-processor on Ada code. Ada has no preprocessor, so you have to do something like that if you want to preprocess your code.
However, it was a concious design decision not to give it one, so doing this is very un-Ada. I wouldn't suggest anyone do this.
A while ago I did some work on a project that used imake for makefile generation. As I recall, it was basically the c preprocessor syntax to generate the make files.
The C preprocessor can also be invoked by the Glasgow Haskell Compiler (GHC) prior to compiling Haskell code, by passing the -cpp flag.
You could implement the C preprocessor in the compiler for another language.
You could use it to preprocess any sort of text file, but there's much better things for that purpose.
Basically what it's saying is that preprocessors have nothing to do with C syntax. They are basically simple parsers that follow a set of rules. So you could use preprocessors kind of like you'd use sed or awk for some silly tasks. Don't ask me why you'd ever want to do it though.
For example, on a text file:
#define pi 3.141
pi is not an irrational number.
Then you run the preprocessor & you'd get.
3.141 is not an irrational number.

Resources