How to read the C-preprocessor file [duplicate] - c

This question already has answers here:
What is the meaning of lines starting with a hash sign and number like '# 1 "a.c"' in the gcc preprocessor output?
(3 answers)
Closed 3 years ago.
I am calling on the crowd to help me understand how to read the preprocessor output. I am attempting to go through an exercise of going through the compilation process of a simple C application on Ubuntu 18.04.
The code simpler.c
#include "simpler.h"
int main()
{
// This is a comment
return 0;
}
for simpler.h
int y;
I then run the command
username$ cpp simpler.c simpler_cpp
This then produces the preprocessed c file as follows
# 1 "simpler.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 31 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 32 "<command-line>" 2
# 1 "simpler.c"
# 1 "simpler.h" 1
int y;
# 3 "simpler.c" 2
int main()
{
return 0;
}
Looking this over I am not sure I follow how to read this file? Or at least put it in terms I understand. I do however see that my comment is not there any more, and that I have my line from my header file there. But other than that this is not too clear.
I am attempting to tell myself a "story" with this file, such as "The # 1 is an preprocessor index of the file, so there are # 1, # 31, # 32, and # 3 or 4 files total". That the "# 31 "" means something... I really don't know what?
If anyone can help me interpret this file, I would greatly appreciate it.
I have been attempting to follow the page http://gcc.gnu.org/onlinedocs/cpp/index.html#Top
but it reads more like an encyclopedia, which although good if you know the road map, if you are starting from square one it becomes more challenging.
Possible Answer:
Thanks for the responses guys,
according to https://gcc.gnu.org/onlinedocs/gcc-9.1.0/cpp/Preprocessor-Output.html#Preprocessor-Output
when reading the output of the preprocessed c file, first macros are expanded, comments are removed, and long runs of blanks lines are discarded.
I read the line
# 1 "simpler.c"
is a linemarker and means that the following line originated in file "simpler.c" at line 1.
After the file there are no flags, which is not described in the file.
The other thing I can do is look at the line
# 1 "simpler.c"
and say the file simpler.c exists, and then ignore it. Which is probably the most practical thing to do. Except I wonder what the
# 1 "built-in"
# 1 "command-line"
mean?
if I ignore these then I get something that looks like
int y;
int main()
{
return 0;
}
Which is what I originally expected from the description of cpp.
Last edit for the day. One thing I have found is the command
cpp -P simpler.c simpler_cpp
give the output
int y;
int main()
{
return 0;
}
There is a no #line flag in the man page that outputs the preprocessed file without any line information. I am guessing that this is really the only output that does matter. I am guessing that this output should have a .i extension, but I don' tknow . Oh well, I hope this is useful to anyone else. If I find any good information out there I will try and write something up.

The lines starting with # exist so the compiler can report errors on the correct file and line. The best thing to do is ignore them . In between them is C code with all macros and include files expanded. Typically one reads preprocessor output to debug macros, but your sample has none.
I guess this is a homework assignment to demonstrate include files.

Related

Joining lines in C [Strange output not compatible with documentation]

I started coding C in vim and I have some problems.
The backslash is intended to join lines but when I try to write:
ret\
urn 0;
I get
return
0;
and when I add spaces before urn; it stay like that without join.
ret\
urn 0;
it stay like that.
why in the second case I don't get return 0; but
ret
urn 0;
code:
CPP output:
command:
gcc -E -Wall -Wextra -Wimplicit -pedantic -std=c99 main.c -o output.i
GCC 5.4,
Vim 7.4
-E output is not officially specified by the standard. It's an engineering tradeoff among several different design constraints, of which the relevant two are:
whitespace must be inserted or deleted as necessary so that the "compiler proper" (imagine feeding the -E output back into gcc -fpreprocessed — which is what -save-temps does) sees the same sequence of pp-tokens that it would normally (without -E). (See C99 section 6.4 for the definition of a pp-token.)
to the maximum extent possible, tokens should appear at the same line and column position that they did in the original source code, so that error messages and debugging information are as accurate as possible.
Here's how this applies to your examples:
ret\
urn 0;
The backslash-newline combines ret and urn into a single pp-token, which must therefore appear all together on one line in the output. The 0 and ;, however, should continue to be on their original line and column so that diagnostics are accurate. So you get
return
0;
with spaces inserted to keep the 0 in its original column.
ret\
urn 0;
Here the backslash-newline is immediately followed by whitespace, so ret and urn do not have to be combined, so, again, the diagnostics are most accurate if everything stays where it originally was, and the output is
ret
urn 0;
which looks like the backslash-newline had no effect at all.
You might find the output of gcc -E -P less surprising. -P tells the preprocessor not to bother trying to preserve token position (and also turns off all those lines beginning with # in the output). Your examples produce return 0; and ret urn 0;, both all on one line, in -P mode.
Finally, a word of advice: everyone who ever has to read your code (and that includes yourself six months later) will appreciate it if you never split a token in the middle with backslash-newline, except for very long string literals. It's a legacy misfeature that wouldn't be included in the language if it were designed from scratch today.
The white space is a token separator. Just because you split the line doesn't mean a white space will be ignored.
What the compiler sees is something like ret urn;. Which is not valid C, since it's two tokens which probably weren't defined before, nor are they in a valid expression.
Keywords must be written as a single token with no spaces.
Now, when you do :
ret\
urn;
The backslash followed by a newline is removed in the early translation phases, and the subsequent line is appended. If the line has no white spaces at the beginning, the result is a valid token that the compiler understands as the keyword return.
Long story short, you seem to be asking about specific behavior for GCC. It seems like a compiler bug. Since clang does the expected thing (although the line count remains the same):
clang -E -Wall -Wextra -Wimplicit -pedantic -std=c99 -x c main.cpp
# 1 "main.cpp"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 316 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "main.cpp" 2
int main(void) {
ret urn 0;
}
It doesn't seem crucial however, since in this particular case the code will be invalid either way.
The behavior of the C preprocessor on \ followed by a newline is to remove both bytes from the input. This is done in a very early phase of the parsing. Yet the preprocessor retains the original line number for each token it sees and tries to output tokens on separate lines for the compiler to issue correct diagnostics for later phases of compilation.
For the input:
ret\
urn 1;
it may produce:
#line 1 "myfile.c"
return
#line 2 "myfile.c"
1;
Which it may shorten as
return
1;
Note that you can split any input line at any position with an escaped newline:
#inclu\
de <st\
dio.h>\
"Hello word\\
n"
for (i = 0; i < n; i+\
+)
ret\
\
\
urn;
\
r\
et\
urn\
123;\

Can ctags search for conditional loops in the code

With ctags , one can search for functions, variables, structures and what not in the code, for e.g.. I wanted to get the line numbers where all conditional loops are called in the code.
For e.g.:
1 #include <stdio.h>
2
3 void funcA() {}
4 void funcB(int a){}
5
6 int main() {
7 int a = 0;
8
9 if(a == 1)
10 {
11 funcA();
12 }
13 else
14 {
15 funcB(a);
16 }
17
18 while(1);
19
20 return 0;
21 }
22
In the example code snippet, with ctags command options, one can find out
funcA # line #3
funcB # line #4
Is there any option in ctags to find 'if' loop called at line number 9, 'else' # line #13. Likewise, 'while' # line #18 ?
If not ctags, any other tool to parse through code to find out such conditionals loops? Writing own parser is another alternative, but then figuring out keywords whether in comments can get challenging.
You should be able to do this with Exuberant Ctags if you're willing to write your own regular expressions. See --regex-<LANG> under Options in the manual.
As an alternative, you could try libclang to parse your code into an abstract syntax tree (AST), and find the interesting elements programmatically.

C preprocessor directives in preprocessed output [duplicate]

This question already has answers here:
What is the meaning of lines starting with a hash sign and number like '# 1 "a.c"' in the gcc preprocessor output?
(3 answers)
Closed 7 years ago.
These are the first few lines of the pre-processor output of a simple C program. What do they mean?
# 1 "test.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 325 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "test.c" 2
# 1 "some_path/stdio.h" 1 3 4
# 64 "some_path/stdio.h" 3 4
Here's my program:
#include <stdio.h>
int main()
{
printf("Hello, World!\n");
return 0;
}
# linenum filename flags
These are called linemarkers. They are inserted as needed into the output (but never within a string or character constant). They mean that the following line originated in file filename at line linenum. filename will never contain any non-printing characters; they are replaced with octal escape sequences.
After the file name comes zero or more flags, which are ‘1’, ‘2’, ‘3’, or ‘4’. If there are multiple flags, spaces separate them. Here is what the flags mean:
‘1’ This indicates the start of a new file.
‘2’ This indicates returning to a file (after having included another file).
‘3’ This indicates that the following text comes from a system header file, so certain warnings should be suppressed.
‘4’ This indicates that the following text should be treated as being wrapped in an implicit extern "C" block.
Source: GCC Manual

simple script or commands to *substitute* stray "\\n" with "\n"

alright, i understand that the title of this topic sounds a bit gibberish... so i'll try to explain it as clearly as i can...
this is related to this previous post (an approach that's been verified to work):
multipass a source code to cpp
-- which basically asks the cpp to preprocess the code once before starting the gcc compile build process
take the previous post's sample code:
#include <stdio.h>
#define DEF_X #define X 22
int main(void)
{
DEF_X
printf("%u", X);
return 1;
}
now, to be able to freely insert the DEF_X anywhere, we need to add a newline
this doesn't work:
#define DEF_X \
#define X 22
this still doesn't work, but is more likely to:
#define DEF_X \n \
#define X 22
if we get the latter above to work, thanks to C's free form syntax and constant string multiline concatenation, it works anywhere as far as C/C++ is concerned:
"literal_str0" DEF_X "literal_str1"
now when cpp preprocesses this:
# 1 "d:/Projects/Research/tests/test.c"
# 1 "<command-line>"
# 1 "d:/Projects/Research/test/test.c"
# 1 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/stdio.h" 1 3
# 19 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/stdio.h" 3
# 1 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/_mingw.h" 1 3
# 32 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/_mingw.h" 3=
# 33 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/_mingw.h" 3
# 20 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/stdio.h" 2 3
ETC_ETC_ETC_IGNORED_FOR_BREVITY_BUT_LOTS_OF_DECLARATIONS
int main(void)
{
\n #define X 22
printf("%u", X);
return 1;
}
we have a stray \n in our preprocessed file. so now the problem is to get rid of it....
now, the unix system commands aren't really my strongest suit. i've compiled dozens of packages in linux and written simple bash scripts that simply enter multiline commands (so i don't have to type them every time or keep pressing the up arrow and choose the correct command successions). so i don`t know the finer points of stream piping and their arguments.
having said that, i tried these commands:
cpp $MY_DIR/test.c | perl -p -e 's/\\n/\n/g' > $MY_DIR/test0.c
gcc $MY_DIR/test0.c -o test.exe
it works, it removes that stray \n.
ohh, as to using perl rather than sed, i'm just more familiar with perl's variant to regex... it's more consistent in my eyes.
anyways, this has the nasty side effect of eating up any \n in the file (even in string literals)... so i need a script or a series of commands to:
remove a \n if:
if it is not inside a quote -- so this won't be modified: "hell0_there\n"
not passed to a function call (inside the argument list)
this is safe as one can never pass a single \n, which is neither a keyword nor an identifier.
if i need to "stringify" an expression with \n, i can simply call a function macro QUOTE_VAR(token). so that encapsulates all instances that \n would have to be treated as a string.
this should cover all cases that \n should be substituted... at least for my own coding conventions.
really, i would do this if i could manage it on my own... but my skills in regex is extremely lacking, only using it in for simple substitutions.
The better way is to replace \n if it occurs in the beginning of line.
The following command should do the work:
sed -e 's/\s*\\n/\n/g'
or occurs before #
sed -e 's/\\n\s*#/\n#/g'
or you can reverse the order of preprocessing and substitute DEF_X with your own tool before C preprocessor.

cpp preprocessor output not able to understand? [duplicate]

This question already has answers here:
What is the meaning of lines starting with a hash sign and number like '# 1 "a.c"' in the gcc preprocessor output?
(3 answers)
Closed 2 years ago.
Sorry if my question is very basic. I would like to understand the output produced by the preprocessor cpp. Let's say i have a very basic following program.
#include <stdio.h>
#include <stdlib.h>
int x=100;
int main ()
{
printf ("\n Welcome..\n");
}
I execute the following command.
cpp main.c main.i
in main.i
# 1 "/usr/include/stdio.h" 1 3 4
What is the meaning of the above line ?..
The gcc documentation explains the C preprocessor output aptly.
Here are the relevant sections:
The output from the C preprocessor looks much like the input, except that all preprocessing directive lines have been replaced with blank lines and all comments with spaces. Long runs of blank lines are discarded.
Source file name and line number information is conveyed by lines of the form
# linenum filename flags
These are called linemarkers. They are inserted as needed into the output (but never within a string or character constant). They mean that the following line originated in file filename at line linenum. filename will never contain any non-printing characters; they are replaced with octal escape sequences.
After the file name comes zero or more flags, which are 1, 2, 3, or 4. If there are multiple flags, spaces separate them. Here is what the flags mean:
1 This indicates the start of a new file.
2
This indicates returning to a file (after having included another file).
3
This indicates that the following text comes from a system header file, so certain warnings should be suppressed.
4
This indicates that the following text should be treated as being wrapped in an implicit extern "C" block.

Resources