Partial preprocessing of C files - c

I'm searching for a preprocessor which allows for partial preprocessing of C source files. What I want to do is to skip certain or all #include and to define some macros on the command line (or a separate file), the preprocessor then should only process what I specified and ignore the rest.
Here is an example of what I want to do:
#define FOO
#ifdef FOO
/* FOO */
#endif
#ifdef BAR
/* BAR */
#endif
should be translated into
/* FOO */
#ifdef BAR
/* BAR */
#endif
Some time ago at my previous job I needed a similar preprocessor and I think that I found a link to a standalone preprocessor here on stackoverflow, however after an hour of searching the web I gave up.

You might be looking for coan, which has the ability to interpret #if and #ifdef directives given a set of definition (or undefinitions) supplied on the command-line.
coan is based on the unifdef utility listed in this related question: Partially processing a file with the preprocessor.
Also see Is there a C preprocessor that eliminates #ifdefs but also evaluates preprocessor macros? which has an example invocation.

Related

What happens when preprocessor lines are processed by the preprocessor? - the '.i' file

I am using Gnu cc compiler of Gcc to compile my C programs. Consider a program,
#include <stdio.h>
int main(){
return 0;
}
Now, when I pre-process the above code, using
cpp sample.c > sample.i
I get a lot of contents in sample.i which I haven't included. Say, 'stdio.h' file is preprocessed. If that is the case,
Question 1:
Why are there so many lines in my preprocessed file? I haven't used any of the standard library functions nor Macros.
Question 2:
Can anyone explain what exactly happens when the preprocessor proccess the C file.(The contents that I got in my '*.i' file)
Compiler: gcc
OS: Ubuntu
Thanks
Why are there so many lines in my preprocessed file? I haven't used any of the standard library functions nor Macros.
Preprocessing is just one part of the compilation process. It's more or less a simple textual replacement and nothing more complex is involved at the preprocessing stage. The preprocessor does not know or care whether you have used any standard functions in your code program or not. An optimizer (as part of the compilation process) might
"remove" parts that are not needed. But the preprocessor doesn't do that.
It'll do preprocessing of all the header files you have included and other header files included via your header files and so on.
Can anyone explain what exactly happens when the preprocessor process the C file.(The contents that I got in my '*.i' file)
The preprocessing involves quite a few tasks: macro replacement, conditional compilation, stringification, string concatenation etc.
You can read more about cpp in detail here: https://gcc.gnu.org/onlinedocs/cpp/
the preprocessor command #include "aFile.h" will put the hole content from aFile.h into your cpp file. And that exactly to the place, where the preprocessor directives stands. That is the reason why you can use the in aFile.h defined functions.
if you are interest to learn more about the preprocessor, there is a very good (and short) guidance on cplusplus.com
The preprocessor does text substitution. The net effect of #include <stdio.h> is to replace the #include <stdio.h> line with the contents of <stdio.h>.
Practically, <stdio.h> contains several declarations of various functions (e.g. fprintf(), fscanf()), declarations of variables (e.g. stdout, stdin), and some macro definitions (which, when used in later code, cause text substitution).
The preprocessor is specified as a phase of compilation, which takes source code as input, substitutes text as required (e.g. the #include as I have described, macro expansions, etc), and outputs the resultant source code. That output is what you are directing into sample.i
The output of the preprocessor is then input to a later phase of compilation, which actually understands declarations, definitions, statements, etc.
The phases of compilation are sequential - they occur one after the other, not all at once. So the later phase of compilation feeds no information whatsoever back to the preprocessor. It is the later phase of compilation that detects if declarations etc are used. But, since it cannot feed such information back to the preprocessor (and the preprocessor is an ignorant program that couldn't use such information anyway) the preprocessor cannot know that declarations are unused, and filter them out.
1) You may not use them, but you have included them in line 1
#include <stdio.h>
That's where what you see come from. Try to remove it to see the difference.
2) The preprocessor read your C file and processed all preprocessor directives that you have declared. All Preprocessor directives start with a '#' symbol. The '#include' will replace this line by the content of the given file. You also have the classical '#ifndef' and '#define' directive. The latter is equal to 'if' statement which allow you to activate a part of a code only if a symbol is defined
#ifndef _SOME_SYMBOL_
#define _SOME_SYMBOL_
#ifndef WIN32
#include <some_file.h>
#else
#include <some_other_file.h>
#endif
int main() { return 0;}
#endif //endof _SOME_SYMBOL_
#ifndef _SOME_SYMBOL_
#define _SOME_SYMBOL_
// this second function is ignored
int main() { return 0;}
#endif //endof _SOME_SYMBOL_
When the preprocessor reads the above file, the symbol "_SOME_SYMBOL_" is unknown, so the preprocessor initializes it. Next it includes the file whether or not it knows of WIN32. Usually this kind of symbol is passed trough command line. So part of your code is dynamically activated or deactivated.
The preprocessor will output this
void some_other_function_from_some_other_file(){}
int main() { return 0;}

Doxygen Document All Conditional Defines

I have a project where I have a substantial amount of conditional defines for making cross platform development easier. However I'm having issues convincing Doxygen to extract all the defines, as it will only pick up ones that only happened to evaluate.
For example in the following snippet, Doxygen will document TARGET_X86_64 but not TARGET_ARM64.
#if defined(_M_ARM64) || defined(__arm64__) || defined(__aarch64__)
/** Build target is ARM64 if defined. */
#define TARGET_ARM64
#else
/** Build target is x86_64 if defined. */
#define TARGET_X86_64
#endif
Enabling EXTRACT_ALL did not help, and disabling preprocessing causes Doxygen to not document anything at all. How do I get doxygen to extract documentation for both cases?
I made a "solution" that's verbose, but works. It's less awkward when you want to have #elseif statements than using just using pure #if statements. Although either would work.
First, define everything and not care about conditional logic.
/** Some define */
#define TARGET_DEFINE
/** Some other define */
#define OTHER_TARGET_DEFINE
Second, take the conditional logic that you originally used to create the defines and transform it to undefine logic.
#if !(ORIGINAL_LOGIC)
#undef TARGET_DEFINE
#endif
Lastly, change the conditional logic so that nothing is undefined when doxygen is doing the preprocessing.
#if !defined(DOXYGEN)
...

How does the preprocessor know to translate HEADER_H to header.h?

Per this question, it seems there is some flexibility to how you can write that--
#ifndef _HEADER_H
or:
#ifndef __HEADER___H__
etc. It's not set in stone.
But I don't understand why we're using underscores at all in the first place. Why can't I just write:
#ifndef header.h
What's wrong with that? Why are we placing underscores everywhere and capitalizing everything? What does the preprocessor do with underscores?
header.h is not a valid identifier. You cannot have a period in a macro name.
That said, the name you pick for your include guard macros is completely arbitrary. After all, it's just another variable. It is purely convention (and reasonable in order to avoid clashes) to name them after the file.
I encourage you to phrase the header structure out aloud to see what the preprocessor does.
#ifndef MY_HEADER_H /* If the macro MY_HEADER_H is not defined (yet)... */
#define MY_HEADER_H /* ... then define it now ... */
... /* ... and deal with all this stuff ... */
#endif /* ... otherwise, skip all over it and go here. */
You see that this mechanism works equally well if you substitute MY_HEADER_H with I_REALLY_LIKE_BANANAS or whatever. The only requirement is that it be a valid macro identifier and not clash with the name of any other include guard.
In the above example, the macro is defined empty. That's fine, but it is not the only option. The second line could equally well read
#define MY_HEADER_H 1
which would then define the macro to 1. Some people do this but it doesn't really add anything and the value 1 is rather arbitrary. I generally don't do this. The only advantage is that if you define it to 1, you can also use #if in addition to #ifdef.
A final word of caution: Identifiers that start with an underscore or contain two or more consecutive underscore characters are reserved for the implementation and should not be used in user-code. Hence, _MY_HEADER_H and __MY_HEADER__H__ are both unfortunate choices.
The logic by which the preprocessor finds the correct header file if you say
#include <myheader.h>
is completely unrelated. Here, myheader.h names a file and the preprocessor will search for it in a number of directories (that usually can e configured via the -I command line option). Only after it has found and opened the file it will go ahead parsing it and thereby, it will eventually find the include guards that will cause it to essentially skip over the file if it has already parsed it before (and the include guard macro is therefore already defined so the first check evaluates to false).
Because #ifdef or #ifndef requires a preprocessor symbol after it, and these symbols cannot contain dots.
In the C11 (latest draft) spec n1570 (§6.10.1):
Preprocessing directives of the forms
# ifdef identifier new-line group opt
# ifndef identifier new-line group opt
check whether the identifier is or is not currently defined as a macro name.
and identifiers cannot contain dots (§6.4.2.1)
BTW, include guards are not required to have #ifdef symbols related to the file name. You can have a header file foo.h guarded with a #ifndef JESUISCHARLIEHEBDO or by #ifndef I_LOVE_PINK_ROSES_BUT_NOT_YELLOW_ONES preprocessor directive if you want so. But by human convention, the names are often related.
Notice that identifiers starting with an underscore are implementation defined, so you should rather avoid #ifndef _FOO_INCLUDED but prefer #ifndef FOO_INCLUDED

Multiple Include Optimization

I'm trying to understand how multiple-include optimization works with gcc.
Lately, I've been reading a lot code that has include guards for standard header files like so
#ifndef _STDIO_H_
#include <stdio.h>
#endif
and I'm trying to figure out if this construct has any benefits.
Here's an example I wrote to understand this a little better.
header1.h
#ifndef _HDR_H_
#define _HDR_H_
#define A (32)
#endif
header2.h
#ifndef _HDR_H_
#define _HDR_H_
#define A (64)
#endif
hdr.c
#include <stdio.h>
#include "header1.h"
#include "header2.h"
int main()
{
printf("%d\n", A);
return 0;
}
Note that both header1.h and header2.h use the same include guard. As expected this program outputs the value of A defined in header1.h; header2.h is skipped since it uses the same include guard.
Here's what I'm trying to understand
At what point when parsing header2.h does the preprocessor skip this file? My understanding is that it skips this file immediately after the #if directive on line 1, i.e. it does not have to wait for the matching #endif. Is this correct?
What can I add to the example above to demonstrate how this works?
EDIT: Thanks everyone for the answers. This is starting to make more sense now. A follow up question. The page linked to on the first line of this post has the following text
The preprocessor notices such header files, so that if the header file
appears in a subsequent #include directive and FOO is defined, then it
is ignored and it doesn't preprocess or even re-open the file a second
time. This is referred to as the multiple include optimization.
If I understand this correctly, this means that any header file is read only once even it is included multiple times for a given compile process. And so, additional include guards in application code or header file provide no benefit.
At what point when parsing header2.h does the preprocessor skip this file?
As #Sean says, header2.h will never be skipped, but the content between the ifndef ... endif will be ignored in this case.
What can I add to the example above to demonstrate how this works?
Add something (for example, a #define B 123) after the #endif in header2.h. Now try to access it in the main. It will be accessible.
Now, try to add it before the #endif. You'll see, that it's not accessible in the `main.
At what point when parsing header2.h does the preprocessor skip this file?
The file is not skipped.
My understanding is that it skips this file immediately after the #if directive on line 1, i.e. it does not have to wait for the matching #endif. Is this correct?
Yes and No. Some compilers identify the sentry macro when it parses the first header file and if it finds it in a second file, it will immediately stop parsing. Other compilers will parse the header again (looking for the matching #endif).
What can I add to the example above to demonstrate how this works?
Add a print message inside and outside the sentry macro
#ifdef _HEADER_INCLUDED
#define _HEADER_INCLUDED
...
#pragma message ("inside sentry in " __FILE__ "\n")
#endif //#ifdef _HEADER_INCLUDED
#pragma message ("outside sentry in " __FILE__ "\n")
Relevant material:
You can use #pragma once instead of the sentry macro. Faster compilation since very little of the file is parsed. No worries about macro name collisions.
You can wrap the includes if checks to sentry macro so the header file isn't loaded again. This is usually used in library headers that include multiple headers many times. Can significantly speed up compilation at the expense of ugly code:
#ifndef __LIST_H_
#include "list.h"
#endif
The pre-processor will never skip header2.h. It will always include it, and when expanding it will ignore the stuff in the #ifndef block.
In your example A will be 32, as the #define in herader2.h will never be reached. If it was reached you'd get some sort of "macro redefinition error" as you'd have multiple #defines for "A". To fix this you#d need to #undef A.
Most compilers support the #pragma once directive these days to save you having to write include guards in header files.
The preprocessor starts blocking all input that follows a false #if[[n]def] to go to through subsequent compiler steps.
The preprocessor does however continues reading the input, to keep track of nesting depth of all those conditional compilation #-directives.
When it finds the matching #endif, of where it started blocking input, it simply stops blocking.
If I understand this correctly, this means that any header file is read only once even it is included multiple times for a given compile process. And so, additional include guards in application code or header file provide no benefit.
No gcc compiler only does this optimization for files that it knows to be safe following the rules:
There must be no tokens outside the controlling #if-#endif pair, but whitespace and comments are permitted.
There must be no directives outside the controlling directive pair, but the null directive (a line containing nothing other than a single ‘#’ and possibly whitespace) is permitted.
The opening directive must be of the form
#ifndef FOO
or
#if !defined FOO [equivalently, #if !defined(FOO)]

Why does my #define macro appear to be a global?

I was investigating a compile and link issue within my program when I came across the following macro that was defined in a header and source file:
/* file_A.c */
#ifndef _NVSize
#define _NVSize 1
#endif
/* file_B.c */
#include "My_Header.h"
#ifndef _NVSize
#define _NVSize 1
#endif
/* My_Header.h */
#define _NVSize 1024
Nothing out of the ordinary yet, until I saw the following information in the GCC output map file:
/* My Map File */
...
.rodata 0x08015694 _NVSize
...
My understanding of the map file is that if you see a symbol in the .rodata section of the map file, this symbol is being treated as a global variable by the compiler. But, this shouldn't be the case because macros should be handled by the preprocessor before the compiler even parses the file. This macro should be replaced with it's defined value before compiling.
Is this the standard way that GCC handles macros or is there some implementation specific reason that GCC would treat this as a global (debug setting maybe)? Also, what does this mean if my macro gets redefined in a different source file? Did I just redefine it for a single source file or did I modify a global variable, thereby changing _NVSize everywhere it's used within my program?
I think the compiler is free to assign your macro to a global variable as long as it ensures that this produces the exact same result as if it did a textual replacement.
During the compilation the compiler can mark this global specially to denote that it is a macro constant value, so no re-assignment is possible, no address can be taken, etc.
If you redefine the macro in your sorce, the compiler might not perform this transformation (and treat it as you'd expect: a pre-compier textual replacement), perform it on one of the different values (or on all of them say, using different names for each occurrance), or do domething else :)
Macros are substituted in the preprocessor step, the compiler only sees the substituted result. Thus if it sees the macro name, then my bet is that the macro wasn't defined at the point of usage. It is defined between the specific #define _NVSize and an #undef _NVSize. Redefining an existing macro without using an #undef first should result in a preprocessor error, AFAIR.
BTW, you shouldn't start your macro names with an underscore. These are reserved for the implementation.

Resources