Preprocessor output fields - c

i have these lines in preprocesor output.
......
1 "test.c"
1 "/usr/include/stdio.h" 1 3 4
27 "/usr/include/stdio.h" 3
4 1 "/usr/include/features.h" 1 3 4
374 "/usr/include/features.h" 1 3 4
......
i got to know that at line 27 in stdio.h there is a call to include features.h but what does the other numbers 1,3,4 in both these line defines.
can anyone explain what exactly these different field means little elaborately.
as you see
1 "/usr/include/stdio.h" 1 3 4
27 "/usr/include/stdio.h" 3 4
why two inclusion of stdio.h or if i am wrong what does that means?

From gcc documentation:
Source file name and line number information is conveyed by lines of
the form
# linenum filename flags
These are called linemarkers. They are inserted as needed into the output (but never within a string or
character constant). They mean that the following line originated in
file filename at line linenum. filename will never contain any
non-printing characters; they are replaced with octal escape
sequences.
After the file name comes zero or more flags, which are ‘1’, ‘2’, ‘3’,
or ‘4’. If there are multiple flags, spaces separate them. Here is
what the flags mean:
‘1’ This indicates the start of a new file.
‘2’ This indicates returning to a file (after having included another file).
‘3’ This indicates that the following text comes from a system header file, so
certain warnings should be suppressed.
‘4’ This indicates that the following text should be treated as being wrapped in an implicit extern "C" block.

Related

Why does an array of text lines appear to have an extra level of container?

I'm reading a file using the "array of lines" mode of Dyalog's ⎕nget:
lines _ _ ← ⎕nget '/usr/share/dict/words' 1
And it appears to work:
lines[1]
10th
But the individual elements don't appear to be character arrays:
line ← lines[1]
line
10th
≢ line
1
⍴ line
Here we see that the first line has a tally of 1 and a shape of the empty array. I can't index into it any further; lines[1][1] or line[1] is a RANK ERROR. If I use ⊂ on the RHS I can assign the value to multiple variables at once and get the same behavior for each variable. But if I do a multiple assignment without the left shoe, I get this:
word rest ← line
word
10th
≢ word
4
⍴ word
4
At last we have the character array I expected! Yet it was not evidently separated from anything else hidden in line; the other variable is identical:
rest
10th
≢ rest
4
⍴ rest
4
word ≡ rest
1
Significantly, when I look at word it has no leading space, unlike line. So it seems that the individual array elements in the content matrix returned by ⎕nget are further wrapped in something that doesn't show up in shape or tally, and can't be indexed into, but when I use a destructuring assignment it unwraps them. It feels rather like the multiple-values stuff in Common Lisp.
If someone could explain what's going on here, I'd appreciate it. I feel like I'm missing something incredibly basic.
The result of reading a file with "array of lines" mode is a nested array. It is specifically a nested vector of character vectors where each character vector is a line from your text file.
For example, take \tmp\test.txt here:
my text file
has 3
lines
If we read this in, we can inspect the contents
(content newline encoding) ← ⎕nget'\tmp\test.txt' 1
≢ content ⍝ How many lines?
3
≢¨content ⍝ How long is each line?
12 5 5
content[2] ⍝ Indexing returns a scalar (non-simple)
┌─────┐
│has 3│
└─────┘
2⊃content ⍝ Use pick to get the contents of the 2nd scalar
has 3
⊃content[2] ⍝ Disclose the non-simple scalar
has 3
As you probably read from the online documentation, the default behaviour of ⎕NGET is to bring in a simple (non-nested) character vector with embedded new line characters. These are typically operating-system dependent.
(content encoding newline) ← ⎕nget'\tmp\test.txt'
newline ⍝ Unicode code points for line endings in this file (Microsoft Windows)
13 10
content
my text file
has 3
lines
content ∊ ⎕ucs 10 13
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1
But with "array of lines" mode, you get a nested result.
For a quick introduction to nested arrays and the array model, see Stefan Kruger's LearnAPL book.
If you turn boxing on it's easier to see what's happening. Each element is an enclosed character vector. Use pick ⊃ instead of bracket index [] to get the actual item.
words ← ⊃⎕nget'/usr/share/dict/words'1
]box on -s=max
⍴words
┌→─────┐
│235886│
└~─────┘
words[10]
┌─────────┐
│ ┌→────┐ │
│ │Aaron│ │
│ └─────┘ │
└∊────────┘
10⊃words ⍝ use pick
┌→────┐
│Aaron│
└─────┘

C Preprocessor output [duplicate]

This question already has an answer here:
cpp preprocessor output not able to understand? [duplicate]
(1 answer)
Closed 5 years ago.
For gcc -E sample.c -o sample.i with the following input C program,
#include <stdio.h>
int main() {
printf("hello world\n");
return 0;
}
the sample.i have the following output preceded by # symbols and I wonder what the line with # exactly means.
# 1 "sample.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "sample.c"
# 1 "/usr/include/stdio.h" 1 3 4
# 27 "/usr/include/stdio.h" 3 4
...
There are comments that help a person identify how the preprocessor expanded the various #include <...> macros and other items.
Reading these lines provides the equivalent of reading the logging messages of the preprocessor as it encounters the macros and expands them.
# 1 "sample.c"
Start on line one of the input "sample.c"
# 1 "<built-in>"
Process the built-in c pre-procssor directive (must be an implementation detail), but is presented as a fake "file".
# 1 "<command-line>"
Process the command line directive (again implementation detail), presented as a fake "file".
# 1 "/usr/include/stdc-predef.h" 1 3 4
Include (at line 1) the stdc-predef.h file, it's the start of the file, suppress warnings permitted for system header files, assure that the symbols are treated like C symbols.
# 1 "<command-line>" 2
Return from the command line "fake" file.
# 1 "sample.c"
Back in sample.c.
# 1 "/usr/include/stdio.h" 1 3 4
Now starting with file "stdio.h", suppress permitted system warnings, treat symbols in the file a C symbols.
# 27 "/usr/include/stdio.h" 3 4
And so on...
The documentation is here.

What do the numbers mean in the preprocessed .i files when compiling C with gcc?

I am trying to understand the compiling process. We can see the preprocessor intermediate file by using:
gcc -E hello.c -o hello.i
or
cpp hello.c > hello.i
I roughly know what the preprocessor does, but I have difficulties understanding the numbers in some of the lines. For example:
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "hello.c"
# 1 "/usr/include/stdio.h" 1 3 4
# 27 "/usr/include/stdio.h" 3 4
# 1 "/usr/include/features.h" 1 3 4
# 374 "/usr/include/features.h" 3 4
The numbers can help debugger to display the line numbers. So my guess for the first column is the line number for column #2 file. But what do the following numbers do?
The numbers following the filename are flags:
1: This indicates the start of a new file.
2: This indicates returning to a file (after having included another file).
3: This indicates that the following text comes from a system header file, so certain warnings should be suppressed.
4: This indicates that the following text should be treated as being wrapped in an implicit extern "C" block.
Source: https://gcc.gnu.org/onlinedocs/cpp/Preprocessor-Output.html

path resolution of include

Is there a definitive way to see where a given #include <example.h> resolves to? I have a #include <linux/unistd.h> in my code but I don't know which unistd.h is being used.
If you use the -E command line option to get the preprocessor output, it will tell you the full path to every header file included, including those included by other headers. For example:
$ cat test.c
#include <unistd.h>
$ gcc -E test.c
# 1 "test.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "test.c"
# 1 "/usr/include/unistd.h" 1 3 4
# 71 "/usr/include/unistd.h" 3 4
# 1 "/usr/include/_types.h" 1 3 4
# 27 "/usr/include/_types.h" 3 4
# 1 "/usr/include/sys/_types.h" 1 3 4
# 32 "/usr/include/sys/_types.h" 3 4
# 1 "/usr/include/sys/cdefs.h" 1 3 4
# 33 "/usr/include/sys/_types.h" 2 3 4
# 1 "/usr/include/machine/_types.h" 1 3 4
# 34 "/usr/include/machine/_types.h" 3 4
# 1 "/usr/include/i386/_types.h" 1 3 4
# 37 "/usr/include/i386/_types.h" 3 4
typedef signed char __int8_t;
(lots more output)
So in this case, the header that's being used is /usr/include/unistd.h.
From the GCC documentation,
On a normal Unix system, if you do not
instruct it otherwise, it will look
for headers requested with #include
in:
/usr/local/include
libdir/gcc/target/version/include
/usr/target/include
/usr/include
In the above, target is the canonical
name of the system GCC was configured
to compile code for; often but not
always the same as the canonical name
of the system it runs on. version is
the version of GCC in use.
You can add to this list with the
-Idir command line option. All the directories named by -I are searched,
in left-to-right order, before the
default directories. The only
exception is when dir is already
searched by default. In this case, the
option is ignored and the search order
for system directories remains
unchanged.
So unless you're adding the -I (capital eye, not ell) switch, the version included is the version found in the first of those directories that holds it.

What files could have been included?

I would like to be able to get a list of all possible files included in a C source file.
I understand there are complications with other # directives (for instance, an #ifdef could either prevent an include or cause an extra include). All I'm looking for is a list of files that may have been included.
Is there a tool that already does this?
The files I'm compiling are only going to .o, and the standard C libraries are not included. I know that sounds wonky, but we have our reasons.
The reason I want to be able to do this is I want to have a list of files which may have contributed something to the .o, so I can check to see if they have changed.
Quoting the man page for gcc:
-M Instead of outputting the result of preprocessing, output a rule
suitable for make describing the dependencies of the main source
file. The preprocessor outputs one make rule containing the object
file name for that source file, a colon, and the names of all the
included files, including those coming from -include or -imacros
command line options.
This basically does what you want. There are several other related options (all starting with -M) that give you different variants of this output.
my syntax is rusty, but ...
grep -ir "#include " *.c
might work ...
If you use gcc you can inspect preprocessor dump:
[~]> gcc -E /usr/include/cups/dir.h|grep "#"
# 1 "/usr/include/cups/dir.h"
# 1 "<built-in>"
# 1 "<command line>"
# 1 "/usr/include/cups/dir.h"
# 26 "/usr/include/cups/dir.h"
# 1 "/usr/include/sys/stat.h" 1 3 4
# 73 "/usr/include/sys/stat.h" 3 4
# 1 "/usr/include/sys/_types.h" 1 3 4
# 32 "/usr/include/sys/_types.h" 3 4
# 1 "/usr/include/sys/cdefs.h" 1 3 4
# 33 "/usr/include/sys/_types.h" 2 3 4
# 1 "/usr/include/machine/_types.h" 1 3 4
# 34 "/usr/include/machine/_types.h" 3 4
# 1 "/usr/include/i386/_types.h" 1 3 4
# 37 "/usr/include/i386/_types.h" 3 4
# 70 "/usr/include/i386/_types.h" 3 4
# 35 "/usr/include/machine/_types.h" 2 3 4
# 34 "/usr/include/sys/_types.h" 2 3 4
# 58 "/usr/include/sys/_types.h" 3 4
# 94 "/usr/include/sys/_types.h" 3 4
# 74 "/usr/include/sys/stat.h" 2 3 4
# 1 "/usr/include/sys/_structs.h" 1 3 4
# 88 "/usr/include/sys/_structs.h" 3 4
# 79 "/usr/include/sys/stat.h" 2 3 4
# 152 "/usr/include/sys/stat.h" 3 4
# 228 "/usr/include/sys/stat.h" 3 4
# 248 "/usr/include/sys/stat.h" 3 4
# 422 "/usr/include/sys/stat.h" 3 4
# 27 "/usr/include/cups/dir.h" 2
# 42 "/usr/include/cups/dir.h"
You could do the preprocessor step only. Most compilers allow this.
Of course that would require some busywork reading the resulting file.
All POSSIBLE files? No way to do that.
Alas, simply grepping the source for #include is not guaranteed to be enough, because someone may have committed ...
#define tricksy(foo,bar) <foo##bar>
#define precious tricksy(ios, tream)
#include precious
int main(int, char **)
{
std::cout << "Hobbits!" << std::endl;
return 0;
}
... though you would be able to tell by inspection that something nonstandard was going on with the #include precious because of the missing <> or "".
A non-perverse example would be token-pasting different library root directories depending on command-line definitions.

Resources