How to find all preprocessor dependencies of a specific code - c

Suppose there is a C/C++ header file with over ten million lines. There are lots of #ifdef and #endif statements beyond counting. What's the most efficient way to find an arbitrary line's all preprocessor dependencies? In other words, how to find all preprocessor definitions that are required to let the compiler include or ignore a block of codes that contains such line?
For example, we have the following code:
#ifdef A
#if defined(B)
#ifdef C
#else
#define X 1
#endif
#endif
#endif
In order to let the compiler include #define X 1, how do I know that I should define A and B but not C in preprocessor without manually reading the code? Or is there an efficient method to manually find all dependencies?

There is AFAIK no tool that can do this for you.
As mentioned in the comments, the correct solution is to reference the documentation. If this is some odd case where that is not an option, then you may be able to work backwards by printing out the values of each macro you are confused on. Here is a bash script I just cooked up that could automate that process for you:
deref.sh:
#!/bin/bash
if [ -z "$2" ]; then
>&2 echo "usage: $0 <file> <macro name> [<macro name> ...]"
exit 2
fi
source_file="$1"
shift
for macro in "$#"; do
play_file="$(mktemp "$(dirname "$source_file")/XXXXXX.c")"
cat "$source_file" > "$play_file"
printf '\n#ifndef %s\nUNDEFINED\n#else\n%s\n#endif' "$macro" "$macro" >> "$play_file"
printf '%s: %s\n' "$macro" "$(gcc -E "$play_file" | tail -1)"
rm "$play_file"
done
usage example...
a.c:
#define X 1
#include <stdio.h>
int main(void)
{
printf("Hello World");
}
in shell:
./deref.sh a.c X Y
X: 1
Y: UNDEFINED

Related

How to show 'preprocessed' code ignoring includes with GCC

I'd like to know if it's possible to output 'preprocessed' code wit gcc but 'ignoring' (not expanding) includes:
ES I got this main:
#include <stdio.h>
#define prn(s) printf("this is a macro for printing a string: %s\n", s);
int int(){
char str[5] = "test";
prn(str);
return 0;
}
I run gcc -E main -o out.c
I got:
/*
all stdio stuff
*/
int int(){
char str[5] = "test";
printf("this is a macro for printing a string: %s\n", str);
return 0;
}
I'd like to output only:
#include <stdio.h>
int int(){
char str[5] = "test";
printf("this is a macro for printing a string: %s\n", str);
return 0;
}
or, at least, just
int int(){
char str[5] = "test";
printf("this is a macro for printing a string: %s\n", str);
return 0;
}
PS: would be great if possible to expand "local" "" includes and not to expand "global" <> includes
I agree with Matteo Italia's comment that if you just prevent the #include directives from being expanded, then the resulting code won't represent what the compiler actually sees, and therefore it will be of limited use in troubleshooting.
Here's an idea to get around that. Add a variable declaration before and after your includes. Any variable that is reasonably unique will do.
int begin_includes_tag;
#include <stdio.h>
... other includes
int end_includes_tag;
Then you can do:
> gcc -E main -o out.c | sed '/begin_includes_tag/,/end_includes_tag/d'
The sed command will delete everything between those variable declarations.
When cpp expands includes it adds # directives (linemarkers) to trace back errors to the original files.
You can add a post processing step (it can be trivially written in any scripting language, or even in C if you feel like it) to parse just the linemarkers and filter out the lines coming from files outside of your project directory; even better, one of the flags (3) marks system header files (stuff coming from paths provided through -isystem, either implicitly by the compiler driver or explicitly), so that's something you could exploit as well.
For example in Python 3:
#!/usr/bin/env python3
import sys
skip = False
for l in sys.stdin:
if not skip:
sys.stdout.write(l)
if l.startswith("# "):
toks = l.strip().split(" ")
linenum, filename = toks[1:3]
flags = toks[3:]
skip = "3" in flags
Using gcc -E foo.c | ./filter.py I get
# 1 "foo.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 31 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "foo.c"
# 1 "/usr/include/stdio.h" 1 3 4
# 4 "foo.c"
int int(){
char str[5] = "test";
printf("this is a macro for printing a string: %s\n", str);;
return 0;
}
Protect the #includes from getting expanded, run the preprocessor textually, remove the # 1 "<stdint>" etc. junk the textual preprocessor generates and reexpose the protected #includes.
This shell function does it:
expand_cpp(){
sed 's|^\([ \t]*#[ \t]*include\)|magic_fjdsa9f8j932j9\1|' "$#" \
| cpp | sed 's|^magic_fjdsa9f8j932j9||; /^# [0-9]/d'
}
as long as you keep the include word together instead of doing crazy stuff like
#i\
ncl\
u??/
de <iostream>
(above you can see 2 backslash continuation lines + 1 trigraph (??/ == \ ) backslash continuation line).
If you wish, you can protect #ifs #ifdefs #ifndefs #endifs and #elses the same way.
Applied to your example
example.c:
#include <stdio.h>
#define prn(s) printf("this is a macro for printing a string: %s\n", s);
int int(){
char str[5] = "test";
prn(str);
return 0;
}
like as with expand_cpp < example.c or expand_cpp example.c, it generates:
#include <stdio.h>
int int(){
char str[5] = "test";
printf("this is a macro for printing a string: %s\n", str);;
return 0;
}
You can use -dI to show the #include directives and post-process the preprocessor output.
Assuming the name of your your file is foo.c
SOURCEFILE=foo.c
gcc -E -dI "$SOURCEFILE" | awk '
/^# [0-9]* "/ { if ($3 == "\"'"$SOURCEFILE"'\"") show=1; else show=0; }
{ if(show) print; }'
or to suppress all # line_number "file" lines for $SOURCEFILE:
SOURCEFILE=foo.c
gcc -E -dI "$SOURCEFILE" | awk '
/^# [0-9]* "/ { ignore = 1; if ($3 == "\"'"$SOURCEFILE"'\"") show=1; else show=0; }
{ if(ignore) ignore=0; else if(show) print; }'
Note: The AWK scripts do not work for file names that include whitespace. To handle file names with spaces you could modify the AWK script to compare $0 instead of $3.
supposing the file is named c.c :
gcc -E c.c | tail -n +`gcc -E c.c | grep -n -e "#*\"c.c\"" | tail -1 | awk -F: '{print $1}'`
It seems # <number> "c.c" marks the lines after each #include
Of course you can also save gcc -E c.c in a file to not do it two times
The advantage is to not modify the source nor to remove the #include before to do the gcc -E, that just removes all the lines from the top up to the last produced by an #include ... if I am right
Many previous answers went in the direction of using the tracing # directives.
It's actually a one-liner in classical Unix (with awk):
gcc -E file.c | awk '/# [1-9][0-9]* "file.c"/ {skip=0; next} /# [1-9][0-9]* ".*"/ {skip=1} (skip<1) {print}'
TL;DR
Assign file name to fname and run following commands in shell. Throughout this ansfer fname is assumed to be sh variable containing the source file to be processed.
fname=file_to_process.c ;
grep -G '^#include' <./"$fname" ;
grep -Gv '^#include[ ]*<' <./"$fname" | gcc -x c - -E -o - $(grep -G '^#include[ ]*<' <./"$fname" | xargs -I {} -- expr "{}" : '#include[ ]*<[ ]*\(.*\)[ ]*>' | xargs -I {} printf '-imacros %s ' "{}" ) | grep -Ev '^([ ]*|#.*)$'
All except gcc here is pure POSIX sh, no bashisms, or nonportable options. First grep is there to output #include directives.
GCC's -imacros
From gcc documentation:
-imacros file: Exactly like ‘-include’, except that any output produced by scanning file is
thrown away. Macros it defines remain defined. This allows you to acquire all
the macros from a header without also processing its declarations
So, what is -include anyway?
-include file: Process file as if #include "file" appeared as the first line of the primary
source file. However, the first directory searched for file is the preprocessor’s
working directory instead of the directory containing the main source file. If
not found there, it is searched for in the remainder of the #include "..."
search chain as normal.
Simply speaking, because you cannot use <> or "" in -include directive, it will always behave as if #include <file> were in source code.
First approach
ANSI C guarantees assert to be macro, so it is perfect for simple test:
printf 'int main(){\nassert(1);\nreturn 0;}\n' | gcc -x c -E - -imacros assert.h.
Options -x c and - tells gcc to read source file from stdin and that the language used is C. Output doesn't contain any declarations from assert.h, but there is still mess, that can be cleaned up with grep:
printf 'int main(){\nassert(1);\nreturn 0;}\n' | gcc -x c -E - -imacros assert.h | grep -Ev '^([ ]*|#.*)$'
Note: in general, gcc won't expand tokens that intended to be macros, but the definition is missing. Nevertheless assert happens to expand entirely: __extension__ is compiler option, __assert_fail is function, and __PRETTY_FUNCTION__ is string literal.
Automatisation
Previous approach works, but it can be tedious;
each #include needs to be deleted from file manually, and
it has to be added to gcc call as -imacros's argument.
First part is easy to script: pipe grep -Gv '^#include[ ]*<' <./"$fname" to gcc.
Second part takes some exercising (at least without awk):
2.1 Drop -v negative matching from previous grep command: grep -G '^#include[ ]*<' <./"$fname"
2.2 Pipe previous to expr inside xarg to extract header name from each include directive: xargs -I {} -- expr "{}" : '#include[ ]*<[ ]*\(.*\)[ ]*>'
2.3 Pipe again to xarg, and printf with -imacros prefix: xargs -I {} printf '-imacros %s ' "{}"
2.4 Enclose all in command substitution "$()" and place inside gcc.
Done. This is how you end up with the lengthy command from the beginning of my answer.
Solving subtle problems
This solution still has flaws; if local header files themselves contains global ones, these global will be expanded. One way to solve this problem is to use grep+sed to transfer all global includes from local files and collect them in each *.c file.
printf '' > std ;
for header in *.h ; do
grep -G '^#include[ ]*<' <./$header >> std ;
sed -i '/#include[ ]*</d' $header ;
done;
for source in *.c ; do
cat std > tmp;
cat $source >> tmp;
mv -f tmp $source ;
done
Now the processing script can be called on any *.c file inside pwd without worry, that anything from global includes would leak into. The final problem is duplication. Local headers including themselves local includes might be duplicated, but this could occur only, when headers aren't guarded, and in general every header should be always guarded.
Final version and example
To show these scripts in action, here is small demo:
File h1.h:
#ifndef H1H
#define H1H
#include <stdio.h>
#include <limits.h>
#define H1 printf("H1:%i\n", h1_int)
int h1_int=INT_MAX;
#endif
File h2.h:
#ifndef H2H
#define H2H
#include <stdio.h>
#include "h1.h"
#define H2 printf("H2:%i\n", h2_int)
int h2_int;
#endif
File main.c:
#include <assert.h>
#include "h1.h"
#include "h2.h"
int main(){
assert(1);
H1;
H2;
}
Final version of the script preproc.sh:
fname="$1"
printf '' > std ;
for source in *.[ch] ; do
grep -G '^#include[ ]*<' <./$source >> std ;
sed -i '/#include[ ]*</d' $source ;
sort -u std > std2;
mv -f std2 std;
done;
for source in *.c ; do
cat std > tmp;
cat $source >> tmp;
mv -f tmp $source ;
done
grep -G '^#include[ ]*<' <./"$fname" ;
grep -Gv '^#include[ ]*<' <./"$fname" | gcc -x c - -E -o - $(grep -G '^#include[ ]*<' <./"$fname" | xargs -I {} -- expr "{}" : '#include[ ]*<[ ]*\(.*\)[ ]*>' | xargs -I {} printf '-imacros %s ' "{}" ) | grep -Ev '^([ ]*|#.*)$'
Output of the call ./preproc.sh main.c:
#include <assert.h>
#include <limits.h>
#include <stdio.h>
int h1_int=0x7fffffff;
int h2_int;
int main(){
((void) sizeof ((
1
) ? 1 : 0), __extension__ ({ if (
1
) ; else __assert_fail (
"1"
, "<stdin>", 4, __extension__ __PRETTY_FUNCTION__); }))
;
printf("H1:%i\n", h1_int);
printf("H2:%i\n", h2_int);
}
This should always compile. If you really want to print every #include "file", then delete < from grep pattern '^#include[ ]*<' in 16-th line of preproc.sh`, but be warned, that content of headers will then be duplicated, and code might fail, if headers contain initialisation of variables. This is purposefully the case in my example to address the problem.
Summary
There are plenty of good answers here so why yet another? Because this seems to be unique solution with following properties:
Local includes are expanded
Global included are discarded
Macros defined either in local or global includes are expanded
Approach is general enough to be usable not only with toy examples, but actually in small and medium projects that reside in a single directory.

simple script or commands to *substitute* stray "\\n" with "\n"

alright, i understand that the title of this topic sounds a bit gibberish... so i'll try to explain it as clearly as i can...
this is related to this previous post (an approach that's been verified to work):
multipass a source code to cpp
-- which basically asks the cpp to preprocess the code once before starting the gcc compile build process
take the previous post's sample code:
#include <stdio.h>
#define DEF_X #define X 22
int main(void)
{
DEF_X
printf("%u", X);
return 1;
}
now, to be able to freely insert the DEF_X anywhere, we need to add a newline
this doesn't work:
#define DEF_X \
#define X 22
this still doesn't work, but is more likely to:
#define DEF_X \n \
#define X 22
if we get the latter above to work, thanks to C's free form syntax and constant string multiline concatenation, it works anywhere as far as C/C++ is concerned:
"literal_str0" DEF_X "literal_str1"
now when cpp preprocesses this:
# 1 "d:/Projects/Research/tests/test.c"
# 1 "<command-line>"
# 1 "d:/Projects/Research/test/test.c"
# 1 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/stdio.h" 1 3
# 19 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/stdio.h" 3
# 1 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/_mingw.h" 1 3
# 32 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/_mingw.h" 3=
# 33 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/_mingw.h" 3
# 20 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/stdio.h" 2 3
ETC_ETC_ETC_IGNORED_FOR_BREVITY_BUT_LOTS_OF_DECLARATIONS
int main(void)
{
\n #define X 22
printf("%u", X);
return 1;
}
we have a stray \n in our preprocessed file. so now the problem is to get rid of it....
now, the unix system commands aren't really my strongest suit. i've compiled dozens of packages in linux and written simple bash scripts that simply enter multiline commands (so i don't have to type them every time or keep pressing the up arrow and choose the correct command successions). so i don`t know the finer points of stream piping and their arguments.
having said that, i tried these commands:
cpp $MY_DIR/test.c | perl -p -e 's/\\n/\n/g' > $MY_DIR/test0.c
gcc $MY_DIR/test0.c -o test.exe
it works, it removes that stray \n.
ohh, as to using perl rather than sed, i'm just more familiar with perl's variant to regex... it's more consistent in my eyes.
anyways, this has the nasty side effect of eating up any \n in the file (even in string literals)... so i need a script or a series of commands to:
remove a \n if:
if it is not inside a quote -- so this won't be modified: "hell0_there\n"
not passed to a function call (inside the argument list)
this is safe as one can never pass a single \n, which is neither a keyword nor an identifier.
if i need to "stringify" an expression with \n, i can simply call a function macro QUOTE_VAR(token). so that encapsulates all instances that \n would have to be treated as a string.
this should cover all cases that \n should be substituted... at least for my own coding conventions.
really, i would do this if i could manage it on my own... but my skills in regex is extremely lacking, only using it in for simple substitutions.
The better way is to replace \n if it occurs in the beginning of line.
The following command should do the work:
sed -e 's/\s*\\n/\n/g'
or occurs before #
sed -e 's/\\n\s*#/\n#/g'
or you can reverse the order of preprocessing and substitute DEF_X with your own tool before C preprocessor.

Print all defined macros

I'm attempting to refactor a piece of legacy code and I'd like a snapshot of all of the macros defined at a certain point in the source. The code imports a ridiculous number of headers etc. and it's a bit tedious to track them down by hand.
Something like
#define FOO 1
int myFunc(...) {
PRINT_ALL_DEFINED_THINGS(stderr)
/* ... */
}
Expected somewhere in the output
MACRO: "FOO" value 1
I'm using gcc but have access to other compilers if they are easier to accomplish this task.
EDIT:
The linked question does not give me the correct output for this:
#include <stdio.h>
#define FOO 1
int main(void) {
printf("%d\n", FOO);
}
#define FOO 0
This very clearly prints 1 when run, but gcc test.c -E -dM | grep FOO gives me 0
To dump all defines you can run:
gcc -dM -E file.c
Check GCC dump preprocessor defines
All defines that it will dump will be the value defined (or last redefined), you won't be able to dump the define value in all those portions of code.
You can also append the option "-Wunused-macro" to warn when macros have been redefined.

One-liner for printing out the value of a macro from a header

I have a header that defines a large number of macros, some of whom depend on other macros -- however, the dependencies are all resolved within this header.
I need a one-liner for printing out the value of a macro defined in that header.
As an example:
#define MACRO_A 0x60000000
#define MACRO_B MACRO_A + 0x00010000
//...
As a first blush:
echo MACRO_B | ${CPREPROCESSOR} --include /path/to/header
... which nearly gives me what I want:
# A number of lines that are not important
# ...
0x60000000 + 0x00010000
... however, I'm trying to keep this from ballooning into a huge sequence of "pipe it to this, then pipe it to that ...".
I've also tried this:
echo 'main(){ printf( "0x%X", MACRO_B ); }' \
| ${CPREPROCESSOR} --include /path/to/header --include /usr/include/stdio.h
... but it (the gcc compiler) complains that -E is required when processing code on standard input, so I end up having to write out to a temporary file to compile/run this.
Is there a better way?
-Brian
echo 'void main(){ printf( "0x%X", MACRO_B ); }' \
| gcc -x c --include /path/to/header --include /usr/include/stdio.h - && ./a.out
will do it in one line.
(You misread the error GCC gives when reading from stdin. You need -E or -x (needed to specify what language is expected))
Also, it's int main(), or, when you don't care like here, just drop the return type entirely. And you don't need to specify the path for stdio.h.
So slightly shorter:
echo 'main(){printf("0x%X",MACRO_B);}' \
| gcc -xc --include /path/to/header --include stdio.h - && ./a.out
What about tail -n1? Like this:
$ echo C_IRUSR | cpp --include /usr/include/cpio.h | tail -n 1
000400
How about artificially generating an error that contains your MACRO_B value in it, and then compiling the code?
I think the easiest way would be to write a small C program, include the header to that, and print the desired output. Then you can use it in your script, makefile or whatever.
echo '"EOF" EOF' | cpp --include /usr/include/stdio.h | grep EOF
prints:
"EOF" (-1)

How can I generate a list of #define values from C code?

I have code that has a lot of complicated #define error codes that are not easy to decode since they are nested through several levels.
Is there any elegant way I can get a list of #defines with their final numerical values (or whatever else they may be)?
As an example:
<header1.h>
#define CREATE_ERROR_CODE(class, sc, code) ((class << 16) & (sc << 8) & code)
#define EMI_MAX 16
<header2.h>
#define MI_1 EMI_MAX
<header3.h>
#define MODULE_ERROR_CLASS MI_1
#define MODULE_ERROR_SUBCLASS 1
#define ERROR_FOO CREATE_ERROR_CODE(MODULE_ERROR_CLASS, MODULE_ERROR_SUBCLASS, 1)
I would have a large number of similar #defines matching ERROR_[\w_]+ that I'd like to enumerate so that I always have a current list of error codes that the program can output. I need the numerical value because that's all the program will print out (and no, it's not an option to print out a string instead).
Suggestions for gcc or any other compiler would be helpful.
GCC's -dM preprocessor option might get you what you want.
I think the solution is a combo of #nmichaels and #aschepler's answers.
Use gcc's -dM option to get a list of the macros.
Use perl or awk or whatever to create 2 files from this list:
1) Macros.h, containing just the #defines.
2) Codes.c, which contains
#include "Macros.h"
ERROR_FOO = "ERROR_FOO"
ERROR_BAR = "ERROR_BAR"
(i.e: extract each #define ERROR_x into a line with the macro and a string.
now run gcc -E Codes.c. That should create a file with all the macros expanded. The output should look something like
1 = "ERROR_FOO"
2 = "ERROR_BAR"
I don't have gcc handy, so haven't tested this...
The program 'coan' looks like the tool you are after. It has the 'defs' sub-command, which is described as:
defs [OPTION...] [file...] [directory...]
Select #define and #undef directives from the input files in accordance with the options and report them on the standard output in accordance with the options.
See the cited URL for more information about the options. Obtain the code here.
If you have a complete list of the macros you want to see, and all are numeric, you can compile and run a short program just for this purpose:
#include <header3.h>
#include <stdio.h>
#define SHOW(x) printf(#x " = %lld\n", (long long int) x)
int main(void) {
SHOW(ERROR_FOO);
/*...*/
return 0;
}
As #nmichaels mentioned, gcc's -d flags may help get that list of macros to show.
Here's a little creative solution:
Write a program to match all of your identifiers with a regular expression (like \#define :b+(?<NAME>[0-9_A-Za-z]+):b+(?<VALUE>[^(].+)$ in .NET), then have it create another C file with just the names matched:
void main() {
/*my_define_1*/ my_define_1;
/*my_define_2*/ my_define_2;
//...
}
Then pre-process your file using the /C /P option (for VC++), and you should get all of those replaced with the values. Then use another regex to swap things around, and put the comments before the values in #define format -- now you have the list of #define's!
(You can do something similar with GCC.)
Is there any elegant way I can get a list of #defines with their final numerical values
For various levels of elegance, sort of.
#!/bin/bash
file="mount.c";
for macro in $(grep -Po '(?<=#define)\s+(\S+)' "$file"); do
echo -en "$macro: ";
echo -en '#include "'"$file"'"\n'"$macro\n" | \
cpp -E -P -x c ${CPPFLAGS} - | tail -n1;
done;
Not foolproof (#define \ \n macro(x) ... would not be caught - but no style I've seen does that).

Resources