This question already has answers here:
What is the meaning of lines starting with a hash sign and number like '# 1 "a.c"' in the gcc preprocessor output?
(3 answers)
Closed 2 years ago.
Sorry if my question is very basic. I would like to understand the output produced by the preprocessor cpp. Let's say i have a very basic following program.
#include <stdio.h>
#include <stdlib.h>
int x=100;
int main ()
{
printf ("\n Welcome..\n");
}
I execute the following command.
cpp main.c main.i
in main.i
# 1 "/usr/include/stdio.h" 1 3 4
What is the meaning of the above line ?..
The gcc documentation explains the C preprocessor output aptly.
Here are the relevant sections:
The output from the C preprocessor looks much like the input, except that all preprocessing directive lines have been replaced with blank lines and all comments with spaces. Long runs of blank lines are discarded.
Source file name and line number information is conveyed by lines of the form
# linenum filename flags
These are called linemarkers. They are inserted as needed into the output (but never within a string or character constant). They mean that the following line originated in file filename at line linenum. filename will never contain any non-printing characters; they are replaced with octal escape sequences.
After the file name comes zero or more flags, which are 1, 2, 3, or 4. If there are multiple flags, spaces separate them. Here is what the flags mean:
1 This indicates the start of a new file.
2
This indicates returning to a file (after having included another file).
3
This indicates that the following text comes from a system header file, so certain warnings should be suppressed.
4
This indicates that the following text should be treated as being wrapped in an implicit extern "C" block.
Related
This question already has answers here:
What is the meaning of lines starting with a hash sign and number like '# 1 "a.c"' in the gcc preprocessor output?
(3 answers)
Closed 3 years ago.
I am calling on the crowd to help me understand how to read the preprocessor output. I am attempting to go through an exercise of going through the compilation process of a simple C application on Ubuntu 18.04.
The code simpler.c
#include "simpler.h"
int main()
{
// This is a comment
return 0;
}
for simpler.h
int y;
I then run the command
username$ cpp simpler.c simpler_cpp
This then produces the preprocessed c file as follows
# 1 "simpler.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 31 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 32 "<command-line>" 2
# 1 "simpler.c"
# 1 "simpler.h" 1
int y;
# 3 "simpler.c" 2
int main()
{
return 0;
}
Looking this over I am not sure I follow how to read this file? Or at least put it in terms I understand. I do however see that my comment is not there any more, and that I have my line from my header file there. But other than that this is not too clear.
I am attempting to tell myself a "story" with this file, such as "The # 1 is an preprocessor index of the file, so there are # 1, # 31, # 32, and # 3 or 4 files total". That the "# 31 "" means something... I really don't know what?
If anyone can help me interpret this file, I would greatly appreciate it.
I have been attempting to follow the page http://gcc.gnu.org/onlinedocs/cpp/index.html#Top
but it reads more like an encyclopedia, which although good if you know the road map, if you are starting from square one it becomes more challenging.
Possible Answer:
Thanks for the responses guys,
according to https://gcc.gnu.org/onlinedocs/gcc-9.1.0/cpp/Preprocessor-Output.html#Preprocessor-Output
when reading the output of the preprocessed c file, first macros are expanded, comments are removed, and long runs of blanks lines are discarded.
I read the line
# 1 "simpler.c"
is a linemarker and means that the following line originated in file "simpler.c" at line 1.
After the file there are no flags, which is not described in the file.
The other thing I can do is look at the line
# 1 "simpler.c"
and say the file simpler.c exists, and then ignore it. Which is probably the most practical thing to do. Except I wonder what the
# 1 "built-in"
# 1 "command-line"
mean?
if I ignore these then I get something that looks like
int y;
int main()
{
return 0;
}
Which is what I originally expected from the description of cpp.
Last edit for the day. One thing I have found is the command
cpp -P simpler.c simpler_cpp
give the output
int y;
int main()
{
return 0;
}
There is a no #line flag in the man page that outputs the preprocessed file without any line information. I am guessing that this is really the only output that does matter. I am guessing that this output should have a .i extension, but I don' tknow . Oh well, I hope this is useful to anyone else. If I find any good information out there I will try and write something up.
The lines starting with # exist so the compiler can report errors on the correct file and line. The best thing to do is ignore them . In between them is C code with all macros and include files expanded. Typically one reads preprocessor output to debug macros, but your sample has none.
I guess this is a homework assignment to demonstrate include files.
Comments are usually converted to a single white-space before the preprocesor is run. However, there is a compelling use case.
#pragma once
#ifdef DOXYGEN
#define DALT(t,f) t
#else
#define DALT(t,f) f
#endif
#define MAP(n,a,d) \
DALT ( COMMENT(| n | a | d |) \
, void* mm_##n = a \
)
/// Memory map table
/// | name | address | description |
/// |------|---------|-------------|
MAP (reg0 , 0 , foo )
MAP (reg1 , 8 , bar )
In this example, when the DOXYGEN flag is set, I want to generate doxygen markup from the macro. When it isn't, I want to generate the variables. In this instance, the desired behaviour is to generate comments in the macros. Any thoughts about how?
I've tried /##/ and another example with more indirection
#define COMMENT SLASH(/)
#define SLASH(s) /##s
neither work.
In doxygen it is possible to run commands on the sources before they are fed into the doxygen kernel. In the Doxyfile there are some FILTER possibilities. In this case: INPUT_FILTER the line should read:
INPUT_FILTER = "sed -e 's%^ *MAP *(\([^,]*\),\([^,]*\),\([^)]*\))%/// | \1 | \2 | \3 |%'"
Furthermore the entire #if construct can disappear and one, probably, just needs:
#define MAP(n,a,d) void* mm_##n = a
The ISO C standard describes the output of the preprocessor as a stream of preprocessing tokens, not text. Comments are not preprocessing tokens; they are stripped from the input before tokenization happens. Therefore, within the standard facilities of the language, it is fundamentally impossible for preprocessing output to contain comments or anything that resembles them.
In particular, consider
#define EMPTY
#define NOT_A_COMMENT_1(text) /EMPTY/EMPTY/ text
#define NOT_A_COMMENT_2(text) / / / text
NOT_A_COMMENT_1(word word word)
NOT_A_COMMENT_2(word word word)
After translation phase 4, both the fourth and fifth lines of the above will both become the six-token sequence
[/][/][/][word][word][word]
where square brackets indicate token boundaries. There isn't any such thing as a // token, and therefore there is nothing you can do to make the preprocessor produce one.
Now, the ISO C standard doesn't specify the behavior of doxygen. However, if doxygen is reusing a preprocessor that came with someone's C compiler, the people who wrote that preprocessor probably thought textual preprocessor output should be, above all, an accurate reflection of the token sequence that the "compiler proper" would receive. That means it will forcibly insert spaces where necessary to make separate tokens remain separate. For instance, with test.c the above example,
$ gcc -E test.c
...
/ / / word word word
/ / / word word word
(I have elided some irrelevant chatter above the output we're interested in.)
If there is a way around this, you are most likely to find it in the doxygen manual. There might, for instance, be configuration options that teach it that certain macros should be understood to define symbols, and what symbols those are, and what documentation they should have.
#include<stdio.h>
#define A -B
#define B -C
#define C 5
int main() {
printf("The value of A is %dn", A);
return 0;
}
I came across the above code. I thought that after preprocessing, it gets transformed to
// code from stdio.h
int main() {
printf("The value of A is %dn", --5);
return 0;
}
which should result in a compilation error. But, the code compiles fine and produces output 5.
How does the code get preprocessed in this case so that it does not result into a compiler error?
PS: I am using gcc version 8.2.0 on Linux x86-64.
The preprocessor is defined as operating on a stream of tokens, not text. You have to read through all of sections 5.1.1, 6.4, and 6.10 of the C standard to fully understand how this works, but the critical bits are in 5.1.1.1 "Phases of translation": in phase 3, the source file is "decomposed into preprocessing tokens"; phases 4, 5, and 6 operate on those tokens; and in phase 7 "each preprocessing token is converted into a token". That indefinite article is critical: each preprocessing token becomes exactly one token.
What this means is, if you start with this source file
#define A -B
#define B -C
#define C 5
A
then, after translation phase 4 (macro expansion, among other things), what you have is a sequence of three preprocessing tokens,
<punctuator: -> <punctuator: -> <pp-number: 5>
and at the beginning of translation phase 7 that becomes
TK_MINUS TK_MINUS TK_INTEGER:5
which is then parsed as the expression -(-(5)) rather than as --(5). The standard offers no latitude in this: a C compiler that parses your example as --(5) is defective.
When you ask a compiler to dump out preprocessed source as text, the form of that text is not specified by the standard; typically, what you get has whitespace inserted as necessary so that a human will understand it the same way translation phase 7 would have.
alright, i understand that the title of this topic sounds a bit gibberish... so i'll try to explain it as clearly as i can...
this is related to this previous post (an approach that's been verified to work):
multipass a source code to cpp
-- which basically asks the cpp to preprocess the code once before starting the gcc compile build process
take the previous post's sample code:
#include <stdio.h>
#define DEF_X #define X 22
int main(void)
{
DEF_X
printf("%u", X);
return 1;
}
now, to be able to freely insert the DEF_X anywhere, we need to add a newline
this doesn't work:
#define DEF_X \
#define X 22
this still doesn't work, but is more likely to:
#define DEF_X \n \
#define X 22
if we get the latter above to work, thanks to C's free form syntax and constant string multiline concatenation, it works anywhere as far as C/C++ is concerned:
"literal_str0" DEF_X "literal_str1"
now when cpp preprocesses this:
# 1 "d:/Projects/Research/tests/test.c"
# 1 "<command-line>"
# 1 "d:/Projects/Research/test/test.c"
# 1 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/stdio.h" 1 3
# 19 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/stdio.h" 3
# 1 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/_mingw.h" 1 3
# 32 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/_mingw.h" 3=
# 33 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/_mingw.h" 3
# 20 "c:\\mingw\\bin\\../lib/gcc/mingw32/4.7.2/../../../../include/stdio.h" 2 3
ETC_ETC_ETC_IGNORED_FOR_BREVITY_BUT_LOTS_OF_DECLARATIONS
int main(void)
{
\n #define X 22
printf("%u", X);
return 1;
}
we have a stray \n in our preprocessed file. so now the problem is to get rid of it....
now, the unix system commands aren't really my strongest suit. i've compiled dozens of packages in linux and written simple bash scripts that simply enter multiline commands (so i don't have to type them every time or keep pressing the up arrow and choose the correct command successions). so i don`t know the finer points of stream piping and their arguments.
having said that, i tried these commands:
cpp $MY_DIR/test.c | perl -p -e 's/\\n/\n/g' > $MY_DIR/test0.c
gcc $MY_DIR/test0.c -o test.exe
it works, it removes that stray \n.
ohh, as to using perl rather than sed, i'm just more familiar with perl's variant to regex... it's more consistent in my eyes.
anyways, this has the nasty side effect of eating up any \n in the file (even in string literals)... so i need a script or a series of commands to:
remove a \n if:
if it is not inside a quote -- so this won't be modified: "hell0_there\n"
not passed to a function call (inside the argument list)
this is safe as one can never pass a single \n, which is neither a keyword nor an identifier.
if i need to "stringify" an expression with \n, i can simply call a function macro QUOTE_VAR(token). so that encapsulates all instances that \n would have to be treated as a string.
this should cover all cases that \n should be substituted... at least for my own coding conventions.
really, i would do this if i could manage it on my own... but my skills in regex is extremely lacking, only using it in for simple substitutions.
The better way is to replace \n if it occurs in the beginning of line.
The following command should do the work:
sed -e 's/\s*\\n/\n/g'
or occurs before #
sed -e 's/\\n\s*#/\n#/g'
or you can reverse the order of preprocessing and substitute DEF_X with your own tool before C preprocessor.
Is it possible to get the entire string on line reported through LINE macro.
Sample code:
#include <stdio.h>
#define LOG(lvl) pLog(lvl, __LINE__, __FILE__)
pLog(const char *str, int line, const char *file)
{
printf("Line [%u]: File [%s]", line, file);
}
int main ()
{
LOG("Hello"
"world");
return 0;
}
The output is: Line [13]: File [macro.c]
Now in a large code base i want to search this file and print the string "Hello world" present at line reported (in this case it is 13)
One way i was thinking is to search for this file first generate the output file with gcc -E do grep for pLog and save their string then grep for LOG in actual code file and save line number match the line number with the line number present in result and then do matching of index and print the string.
As string can be distributed across multiple lines (as in code Hello is in one line and world is in another line) so also need to take care of that.
Is there anyother best and fast way of doing it or gcc provide some option to convert back line and file to actual code
This is very easy to do with Clang. The following command dumps Abstract Syntax Tree (AST) for the file test.c to the file out:
clang -cc1 -ast-dump test.c > out
Looking at the AST in the generated file you can easily find the information you need:
(StringLiteral 0x1376cd8 <line:12:9, line:13:13> 'char [11]' lvalue "Helloworld")))
Clang gives start of the first token of the string (line:12:9), start of the last token of the string (line:13:13) and the full string ("Helloworld").
You can either parse the AST dump or use Clang API to get the same information. If this is not a one time task, I'd go for API since the AST dump format is more likely to change in the future.
All this of course make sense only if you have a reason not to print the string in pLog itself.