C preprocessor directives in preprocessed output [duplicate] - c

This question already has answers here:
What is the meaning of lines starting with a hash sign and number like '# 1 "a.c"' in the gcc preprocessor output?
(3 answers)
Closed 7 years ago.
These are the first few lines of the pre-processor output of a simple C program. What do they mean?
# 1 "test.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 325 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "test.c" 2
# 1 "some_path/stdio.h" 1 3 4
# 64 "some_path/stdio.h" 3 4
Here's my program:
#include <stdio.h>
int main()
{
printf("Hello, World!\n");
return 0;
}

# linenum filename flags
These are called linemarkers. They are inserted as needed into the output (but never within a string or character constant). They mean that the following line originated in file filename at line linenum. filename will never contain any non-printing characters; they are replaced with octal escape sequences.
After the file name comes zero or more flags, which are ‘1’, ‘2’, ‘3’, or ‘4’. If there are multiple flags, spaces separate them. Here is what the flags mean:
‘1’ This indicates the start of a new file.
‘2’ This indicates returning to a file (after having included another file).
‘3’ This indicates that the following text comes from a system header file, so certain warnings should be suppressed.
‘4’ This indicates that the following text should be treated as being wrapped in an implicit extern "C" block.
Source: GCC Manual

Related

What lines <built-in>, <command-line> and from where a header it taken after simple gcc -E means?

main.c:
int main() { return 0; }
After preprocessing stage: gcc -E main.c
# 1 "main.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 31 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 32 "<command-line>" 2
# 1 "main.c"
int main() { return 0; }
I know that:
the first numbers are line numbers of a processed file;
the "strings" are file names;
the numbers at the end of lines are described here https://gcc.gnu.org/onlinedocs/cpp/Preprocessor-Output.html
What does other lines mean? I mean: <built-in>, <command-line> and from where /usr/include/stdc-predef.h is taken?
Here I found this question GCC preprocessing, what are the built-in and command-line lines for? almost "without" answers.
gcc version 8.3.0 (Debian 8.3.0-6)
UPDATED: Explanation of /usr/include/stdc-predef.h
The header file stdc-predef.h was hardcoded in gcc/config/glibc-c.c (from git repo):
26 /* Implement TARGET_C_PREINCLUDE for glibc targets. */
27
28 static const char *
29 glibc_c_preinclude (void)
30 {
31 return "stdc-predef.h";
32 }
It is processed in push_command_line_include of gcc/c-family/c-opts.c:
1534 /* Give CPP the next file given by -include, if any. */
1535 static void
1536 push_command_line_include (void)
1537 {
1538 /* This can happen if disabled by -imacros for example.
1539 Punt so that we don't set "<command-line>" as the filename for
1540 the header. */
1541 if (include_cursor > deferred_count)
1542 return;
1543
1544 if (!done_preinclude)
1545 {
1546 done_preinclude = true;
1547 if (flag_hosted && std_inc && !cpp_opts->preprocessed)
1548 {
1549 const char *preinc = targetcm.c_preinclude ();
1550 if (preinc && cpp_push_default_include (parse_in, preinc))
1551 return;
1552 }
1553 }
and pseudo-filenames "<built-in>" and "<command-line>" are added in c_finish_options there also.
Start with an empty header.
$ touch foo.h
You are already aware of the numbers in the output of the preprocessor, so won't re-iterate. Coming to <built-in>, it is the list of the predefined macros. Using the preprocessor documentation
-dM Instead of the normal output, generate a list of #define
directives for all the macros defined during the
execution of the preprocessor, including predefined
macros. This gives you a way of finding out what is
predefined in your version of the preprocessor. Assuming
you have no file foo.h, the command
touch foo.h; cpp -dM foo.h
shows all the predefined macros.
So, doing that should give all the predefined macros and their expanions as:
#define __SSP_STRONG__ 3
#define __DBL_MIN_EXP__ (-1021)
#define __FLT32X_MAX_EXP__ 1024
#define __UINT_LEAST16_MAX__ 0xffff
#define __ATOMIC_ACQUIRE 2
:
To see how <command-line> is expanded, pass in a command-line define using the -DX=Y syntax
$ gcc -E -DDBG=1 -dN foo.h|grep 'command-line' -A 1 -B 1
#define __DECIMAL_BID_FORMAT__
# 1 "<command-line>"
#define DBG
-- #define __STDC_ISO_10646__
# 1 "<command-line>" 2
# 1 "foo.h"
DBG shows up under the <command-line> set
As for "/usr/include/stdc-predef.h", well that's the file that contains some of those pred-defined macros. e.g on my system:
#ifdef __GCC_IEC_559
# if __GCC_IEC_559 > 0
# define __STDC_IEC_559__ 1
# endif
which matches with the pre-processor output:
$ gcc -E foo.h -dM|grep __STDC_IEC_559__
#define __STDC_IEC_559__ 1
You can always use the cpp binary for just doing the pre-processing part instead of using gcc -E.
A lot more is actually explained in this answer.

Preprocessor only on arbitrary file?

I wanted to demonstrate that the preprocessor is totally independant of the build process. It is another grammar and another lexer than the C language. In fact I wanted to show that the preprocessor could be applied to any type of file.
So I have this arbitrary file:
#define FOO
#ifdef FOO
I am foo
#endif
#
#
#
Something
#pragma Hello World
And I thought this would work:
$ gcc -E test.txt -o-
gcc: warning: test.txt: linker input file unused because linking not done
Unfortunately it only work with this:
$ cat test.txt | gcc -E -
Why is this error with GCC?
You need to tell gcc it's a C file.
gcc -xc -E test.txt
The C compiler uses the file name suffix as an indicator of the files that have to be compiled (ending in .c) files that have only to be linked (ending in .o or .so) For the files ending in .s it calls the assembler as(1) and for files ending in .f it call the fortran compiler, and for .cc it switches to C++ compiling.
Indeed, normally, C compilers take everything they don't match as a linker file, so once you pass it a linker file, it tries to link it, calling the linker ld(1). This is what happens with your .txt file. The linker has some similar way to recognise ld(1) scripts against object or shared object files.
BTW, the CPP language is indeed a macro language, but there's some similarities with C that cannot be avoided. It has, at least, to recognise C identifiers, as macro names have the same syntax as C identifiers, and it has to check that an identifier matches a macro name or not. In other side... It has to recognise C comments, and C strings (it indeed eliminates comments for the compiler), as macro expansion doesn't enter to expand inside them, and it has also to recognize parenthesis (they are considered for macro parameter detection and the , symbol, used to separate parameters). It also recognizes (inside the macro string) the tokens # (to stringify a parameter) and ## (to catenate and merge two symbols into one) (this last operator must force cpp to recognise almost any C token, as it must check for errors if you try to merge something like +##+ into ++, which is an error)
So, the conclussion is: the cpp doesn't have to implement the whole C syntax as a language, but the tokens of the C language must be recognised almost completely. The standard for the C language forces the c preprocessor to tokenize the input, so the ## operator can be used to merge tokens (and to check for validity) This means that, if you define a macro like:
#define M(p) +p
and then you call it like:
a = +M(-c);
you will get a string similar to:
a = + +-c;
in the output (it will insert a space in between the two + signs, so they don't get merged into ++ operator. The symbols + and - are together, because they will never be scanned as one token) See the next example (input is preceded by > symbol)
$ cpp - <<EOF
> #define M(p) +p
> a = +M(p);
> b = -M(p);
> p = +M(+p);
> p = +M(-p);
> EOF
# 1 "<stdin>"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 346 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "<stdin>" 2
a = + +p;
b = -+p;
p = + + +p;
p = + +-p;
Another example will show more difficulties in parsing the tokens (input is delimited with >, stderr with >> and stdout is unquoted):
$ cpp - <<EOF
#define M(a,b) a##b
> a = M(a+,+b)
> a = M(a+,-b)
> a = M(a,+b)
> a = M(a,b)
> a = M(a,300)
> a = M(a,300.2)
> EOF
>> <stdin>:3:5: error: pasting formed '+-', an invalid preprocessing token
>> a = M(a+,-b)
>> ^
>> <stdin>:1:17: note: expanded from macro 'M'
>> #define M(a,b) a##b
>> ^
>> <stdin>:4:5: error: pasting formed 'a+', an invalid preprocessing token
>> a = M(a,+b)
>> ^
>> <stdin>:1:17: note: expanded from macro 'M'
>> #define M(a,b) a##b
>> ^
>> <stdin>:7:5: error: pasting formed 'a300.2', an invalid preprocessing token
>> a = M(a,300.2)
>> ^
>> <stdin>:1:17: note: expanded from macro 'M'
>> #define M(a,b) a##b
>> ^
>> 3 errors generated.
# 1 "<stdin>"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 346 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "<stdin>" 2
a = a++b
a = a+-b
a = a+b
a = ab
a = a300
a = a 300.2
As you can see in this example, merging a and 300 goes fine, as one token makes an identifier, which is valid and cpp(1) doesn't complain, but when merging a and 300.2 the resulting token a300.2 is not a valid token in C, so it is rejected (it is also not joined and the tool inserts a space, to make the compiler see both tokens as separate ---should it joined both together, they would have been scanned as the tokens a300 and .2).
If you want to use a language independent macro preprocesor, consider using m4(1) as a macro language. It's far more powerful than cpp in many ways. But beware, it's difficult to learn due to the complexity of macro expansions it allows.
You can use the C preprocessor, cpp (or the more traditional form, /lib/cpp):
cpp test.txt
or
/lib/cpp test.txt

How to read the C-preprocessor file [duplicate]

This question already has answers here:
What is the meaning of lines starting with a hash sign and number like '# 1 "a.c"' in the gcc preprocessor output?
(3 answers)
Closed 3 years ago.
I am calling on the crowd to help me understand how to read the preprocessor output. I am attempting to go through an exercise of going through the compilation process of a simple C application on Ubuntu 18.04.
The code simpler.c
#include "simpler.h"
int main()
{
// This is a comment
return 0;
}
for simpler.h
int y;
I then run the command
username$ cpp simpler.c simpler_cpp
This then produces the preprocessed c file as follows
# 1 "simpler.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 31 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 32 "<command-line>" 2
# 1 "simpler.c"
# 1 "simpler.h" 1
int y;
# 3 "simpler.c" 2
int main()
{
return 0;
}
Looking this over I am not sure I follow how to read this file? Or at least put it in terms I understand. I do however see that my comment is not there any more, and that I have my line from my header file there. But other than that this is not too clear.
I am attempting to tell myself a "story" with this file, such as "The # 1 is an preprocessor index of the file, so there are # 1, # 31, # 32, and # 3 or 4 files total". That the "# 31 "" means something... I really don't know what?
If anyone can help me interpret this file, I would greatly appreciate it.
I have been attempting to follow the page http://gcc.gnu.org/onlinedocs/cpp/index.html#Top
but it reads more like an encyclopedia, which although good if you know the road map, if you are starting from square one it becomes more challenging.
Possible Answer:
Thanks for the responses guys,
according to https://gcc.gnu.org/onlinedocs/gcc-9.1.0/cpp/Preprocessor-Output.html#Preprocessor-Output
when reading the output of the preprocessed c file, first macros are expanded, comments are removed, and long runs of blanks lines are discarded.
I read the line
# 1 "simpler.c"
is a linemarker and means that the following line originated in file "simpler.c" at line 1.
After the file there are no flags, which is not described in the file.
The other thing I can do is look at the line
# 1 "simpler.c"
and say the file simpler.c exists, and then ignore it. Which is probably the most practical thing to do. Except I wonder what the
# 1 "built-in"
# 1 "command-line"
mean?
if I ignore these then I get something that looks like
int y;
int main()
{
return 0;
}
Which is what I originally expected from the description of cpp.
Last edit for the day. One thing I have found is the command
cpp -P simpler.c simpler_cpp
give the output
int y;
int main()
{
return 0;
}
There is a no #line flag in the man page that outputs the preprocessed file without any line information. I am guessing that this is really the only output that does matter. I am guessing that this output should have a .i extension, but I don' tknow . Oh well, I hope this is useful to anyone else. If I find any good information out there I will try and write something up.
The lines starting with # exist so the compiler can report errors on the correct file and line. The best thing to do is ignore them . In between them is C code with all macros and include files expanded. Typically one reads preprocessor output to debug macros, but your sample has none.
I guess this is a homework assignment to demonstrate include files.

Understanding the different #define Declarations

I have a code base which uses #define in a different way then I am accustomed to.
I know that, for example, #define a 5 will replace variable a with 5 in the code.
But what would this mean:
'#define MSG_FLAG 5, REG, MSGCLR'
I tried doing it in a simple code and compiling it. It takes the last value (like the third argument as MSGCLR).
Preprocessing is largely just string replacement that happens before the "real" compilation starts. So we don't have any idea of what a variable is at this point.
The commas here are not any special syntax. This will cause any appearance of MSG_FLAG in the code to be replaced by 5, REG, MSGCLR
Most compilers have a flag that will just run the preprocessor, so you can see for yourself. On gcc, this is -E.
So to verify this, we can have some nonsense source:
#define MSG_FLAG 5, REG, MSGCLR
MSG_FLAG
Compile with gcc -E test.c
And the output is:
# 1 "test.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "test.c"
5, REG, MSGCLR

cpp preprocessor output not able to understand? [duplicate]

This question already has answers here:
What is the meaning of lines starting with a hash sign and number like '# 1 "a.c"' in the gcc preprocessor output?
(3 answers)
Closed 2 years ago.
Sorry if my question is very basic. I would like to understand the output produced by the preprocessor cpp. Let's say i have a very basic following program.
#include <stdio.h>
#include <stdlib.h>
int x=100;
int main ()
{
printf ("\n Welcome..\n");
}
I execute the following command.
cpp main.c main.i
in main.i
# 1 "/usr/include/stdio.h" 1 3 4
What is the meaning of the above line ?..
The gcc documentation explains the C preprocessor output aptly.
Here are the relevant sections:
The output from the C preprocessor looks much like the input, except that all preprocessing directive lines have been replaced with blank lines and all comments with spaces. Long runs of blank lines are discarded.
Source file name and line number information is conveyed by lines of the form
# linenum filename flags
These are called linemarkers. They are inserted as needed into the output (but never within a string or character constant). They mean that the following line originated in file filename at line linenum. filename will never contain any non-printing characters; they are replaced with octal escape sequences.
After the file name comes zero or more flags, which are 1, 2, 3, or 4. If there are multiple flags, spaces separate them. Here is what the flags mean:
1 This indicates the start of a new file.
2
This indicates returning to a file (after having included another file).
3
This indicates that the following text comes from a system header file, so certain warnings should be suppressed.
4
This indicates that the following text should be treated as being wrapped in an implicit extern "C" block.

Resources