Which file is generated after preprocessing of a C program? - c

For a C program just before compilation, i.e. after the pre-processing has been completed which file(what extension) is generated?

It is compiler dependent. Most compilers by default don't generate intermediate pre-processor files.
With gcc, if you add -save-temps option to get the intermediate files, the output of the pre-processor is dumped in a .i file. With -E option (to perform only the pre-processing), without -o to specify the output file, the result is dumped to stdout.

In most current compilers (e.g. GCC or Clang/LLVM) - and for performance reasons - the C/C++ preprocessor is an internal part of the compiler (in GCC it is libcpp/ and is a library ...), so no preprocessed form is output into a file.
In the very first C or proto-C compilers (1970s PDP-8) the memory was so small (64kilobytes!) that such an organization was not possible, and the preprocessor was a separate program /lib/cpp
Today, our laptops have several gigabytes of memory, which is usually much larger than the preprocessed form (of the largest source file you'll feed to your compiler). So current compilers keep some internal representation of the whole translation unit and are able to optimize it entirely (inter-procedural optimizations, including inlining).
All compilers keep several forms of the abstract syntax tree (AST); the bulk of the work of a compiler is not parsing or code generation, but transforming some internal representation of the AST into another internal representation (itself further transformed). In GCC most of the optimizations are working on the GIMPLE form. You can extend the compiler by adding your own optimization passes, e.g. with your GCC plugin.
In turn, this technological evolution has fertilized the (evolution of) the definition of our programming languages, recent C++11 is designed for a very optimizing compiler. The recent style guiding or coding hints around C++11 are presupposing (and makes sense only because of) very powerful optimizations.
You still can usually invoke the compiler to spit the preprocessed form, e.g. with gcc -C -E source.c > source.i, in a seperate file (conventionally suffixed .i or .ii, and such suffixes can be known to builder like make)

Journey of a C Program to Linux Executable in 4 Stages :
Pre-processing
Compilation
Assembly
Linking
Check this link for more details C program compilation process

>gcc -E fname.c >fname.x /fname.x is the pre-processed output to which u r saving/
The following four things happen in the pre-processing stage.
1> header file inclusion
2>comment removal
3>macro substitution (eg if u have #fenine NUM 10, where-ever in code you have used NUM ll be replaced by 10)
4> conditional compilation (eg
#if 0
...
some code
...
#endif
since "#if 0" evaluates to 0, the code under it never executes. Therefore code under it is not included in your pre-processed output

Related

How do I get a full assembly code from C file?

I'm currently trying to figure out the way to produce equivalent assembly code from corresponding C source file.
I've been using the C language for several years, but have little experience with assembly language.
I was able to output the assembly code using the -S option in gcc. However, the resulting assembly code contained call instructions which in turn make a jump to another function like _exp. This is not what I wanted, I needed a fully functional assembly code in a single file, with no dependency to other code.
Is it possible to achieve what I'm looking for?
To better describe the problem, I'm showing you my code here:
#include <math.h>
float sigmoid(float i){
return 1/(1+exp(-i));
}
The platform I am working on is Windows 10 64-bit, the compiler I'm using is cl.exe from MSbuild.
My initial objective was to see, at a lowest level possible, how computers calculate mathematical functions. The level where I decided to observe the calculation process is assembly code, and the mathematical function I've chosen was sigmoid defined as above.
_exp is the standard math library function double exp(double); apparently you're on a platform that prepends a leading underscore to C symbol names.
Given a .s that calls some library functions, build it the same way you would a .c file that calls library functions:
gcc foo.S -o foo -lm
You'll get a dynamic executable by default.
But if you really want all the code in one file with no external dependencies, you can link your .c into a static executable and disassemble that.
gcc -O3 -march=native foo.c -o foo -static -lm
objdump -drwC -Mintel foo > foo.s
There's no guarantee that the _exp implementation in libm.a (static library) is identical to the one you'd get in libm.so or libm.dll or whatever, because it's a different file. This is especially true for a function like memcpy where dynamic-linker tricks are often used to select an optimal version (for your CPU) at run-time.
It is not possible in general, there are exceptions sure, I could craft one so that means other folks can too, but it isnt an interesting program.
Normally your C program, your main() entry point is only a percentage of the code. There is a bootstrap that contains the actual entry point for the operating system to launch your program, this does some things that prepare your virtual memory space so that your program can run. Zeros .bss and other such things. that is often and or should be written in assembly language (otherwise you get a chicken and egg problem) but not an assembly language file you will see unless you go find the sources for the C library, you will often get an object as part of the toolchain along with other compiler libraries, etc.
Then if you make any C calls or create code that results in a compiler library call (perform a divide on a platform that doesnt support divide, perform floating point on a platform that doesnt have floating point, etc) that is another object that came from some other C or assembly that is part of the library or compiler sources and is not something you will see during the compile/assemble/link (the chain in toolchain) process.
So except for specifically crafted trivial programs or specifically crafted tools for this purpose (for specific likely baremetal platforms), you will not see your whole program turn into one big assembly source file before it gets assembled then linked.
If not baremetal then there is of course the operating system layer which you certainly would not get to see as part of your source code, ultimately the C library calls that need the system will have a place where they do that, all compiled to object/lib before you use them, and the assembly sources for the operating system side is part of some other source and build process somewhere else.

Changing preprocessed values during compile time

I have written some code using pre processor directives to skip some statements to be executed.But My C code inside main is interested to change previously #defined values and assign new values as per condition and also change the result of pre processed statements too during run time.In short I have to change the pre processed statements during run time. How can I do this?
In short I have to change the pre processed statements during run time
This is impossible. Read about C preprocessing & cpp. Compile-time and run-time are different (and the compiled code could even run on a different machine, read more about cross-compiling). If using GCC, use gcc -C -E foo.c > foo.i to preprocess your foo.c source file into foo.i preprocessed form (and then use an editor or a page to look inside that generated foo.i)
Perhaps you want to load additional code at runtime. This is not possible with pure C99 standard code. Perhaps your operating system offers dynamic loading. POSIX specifies dlopen. You might also want to use JIT compiling techniques to construct machine code at runtime, e.g. with libraries like GCCJIT, asmjit, GNU lightning, libjit, LLVM, ...
Read also about homoiconic languages. Consider coding in Common Lisp (e.g. with SBCL).
Perhaps you want to customize your GCC compiler with MELT.
Not possible. Preprocessing happens before compile-time.
The compiler only sees the result of the preprocessor, nothing more.

Can you add preprocessor directives in assembly?

I would like to execute some assembly instructions based on a define from a header file.
Let's say in test.h I have #define DEBUG.
In test.asm I want to check somehow like #ifdef DEBUG do something...
Is such thing possible? I was not able to find something helpful in the similar questions or online.
Yes, you can run the C preprocessor on your asm file. Depends on your build environment how to do this. gcc, for example, automatically runs it for files with extension .S (capital). Note that whatever you include, should be asm compatible. It is common practice to conditionally include part of the header, using #ifndef ASSEMBLY or similar constructs, so you can have C and ASM parts in the same header.
The C preprocessor is just a program that inputs data (C source files), transforms it, and outputs data again (translation units).
You can run it manually like so:
gcc -E < input > output
which means you can run the C preprocessor over .txt files, or latex files, if you want to.
The difficult bit, of course, is how you integrate that in your build system. This very much depends on the build system you're using. If that involves makefiles, you create a target for your assembler file:
assembler_file: input_1 input_2
gcc -E < $^ > $#
and then you compile "assembler_file" in whatever way you normally compile it.
Sure but that is no longer assembly language, you would need to feed it through a C preprocessor that also knows that this is a hybrid C/asm file and does the c preprocessing part but doesnt try to compile, it then feeds to to the assembler or has its own assembler built in.
Possible, heavily depends on your toolchain (either supported or not) but IMO leaves a very bad taste, YMMV.

Run GCC preprocessor non-C files

I'm using a proprietary development environment that compiles code written in C, as well as the IEC 61131 languages. For the C compilation, it uses GCC 4.1.2 with these build options:
-fPIC -O0 -g -nostartfiles -Wall -trigraphs -fno-asm
The compilation is done by a program running on windows utilizing Cygwin.
My issue is, IEC language preprocessor is not that useful (doesn't support #define at all) and I want to use macros! I don't see why the GCC preprocessor would really care what language it is processing (my target language is Structured Text), so I'm looking to see if anyone might know a way to get it to process files of different file types that then are not compiled further (I'm just looking for macro expansion before the file is run through the IEC compiler). I'm very ignorant of compiler options and environments since I've never had to deal with them, I just write C code and it magically compiles and transfers to my target system to run.
The only things I can really do are add build options and execute a batch file before anything is executed. I think my best hope lies in using a batch file to process all files of a certain extension, but I don't even know what executable in the gnuinst folder to use, let alone what flags to use to run through the files.
Just about any C preprocessor, including gcc's cpp, is going to assume that its input is valid C code. It has to tokenize the input following C (or C++, or Objective-C) rules, because it had to resolve its input into tokens (more precisely preprocessing tokens). Constructs above the token level shouldn't be an issue.
You certainly can use cpp or gcc -E to preprocess text that isn't C source code, but some input constructs will cause problems.
Taking an example from the comments:
$ cat foo.txt
#define ADDTHEM(x, y) ((x) + (y))
ADDTHEM(2, 3)
$ gcc -E - < foo.txt
# 1 "<stdin>"
# 1 "<command-line>"
# 1 "<stdin>"
((2) + (3))
Note that I had to use gcc -E - < foo.txt rather than gcc -E foo.txt, because gcc treats a .txt file as a linker input file by default.
But if you add some content to foo.txt that doesn't consist of valid C preprocessor tokens, you can have problems:
$ cat foo.txt
#define ADDTHEM(x, y) ((x) + (y))
ADDTHEM(2, 3)
ADDTHEM('c, "s)
$ gcc -E - < foo.txt
# 1 "<stdin>"
# 1 "<command-line>"
# 1 "<stdin>"
((2) + (3))
<stdin>:3:9: warning: missing terminating ' character [enabled by default]
<stdin>:3:0: error: unterminated argument list invoking macro "ADDTHEM"
ADDTHEM
(Attempts to feed Ada source code to a C preprocessor have run into this kind of problem, since Ada uses isolated apostrophe ' characters for its attribute syntax.)
So you can do it if the input language doesn't use things that aren't valid C preprocessor tokens.
See the N1570 draft of the C standard, section 6.4, for more information about preprocessing tokens.
I actually wrote the above before I checked the GNU cpp manual, which says:
The C preprocessor is intended to be used only with C, C++, and
Objective-C source code. In the past, it has been abused as a general
text processor. It will choke on input which does not obey C's lexical
rules. For example, apostrophes will be interpreted as the beginning of
character constants, and cause errors. Also, you cannot rely on it
preserving characteristics of the input which are not significant to
C-family languages. If a Makefile is preprocessed, all the hard tabs
will be removed, and the Makefile will not work.
Having said that, you can often get away with using cpp on things
which are not C. Other Algol-ish programming languages are often safe
(Pascal, Ada, etc.) So is assembly, with caution. `-traditional-cpp'
mode preserves more white space, and is otherwise more permissive. Many
of the problems can be avoided by writing C or C++ style comments
instead of native language comments, and keeping macros simple.
Wherever possible, you should use a preprocessor geared to the
language you are writing in. Modern versions of the GNU assembler have
macro facilities. Most high level programming languages have their own
conditional compilation and inclusion mechanism. If all else fails,
try a true general text processor, such as GNU M4.
(The authors of that manual apparently missed the problem with Ada's attribute syntax.)

C/C++ Compiler listing what's defined

This question : Is there a way to tell whether code is now being compiled as part of a PCH? lead me to thinking about this.
Is there a way, in perhaps only certain compilers, of getting a C/C++ compiler to dump out the defines that it's currently using?
Edit: I know this is technically a pre-processor issue but let's add that within the term compiler.
Yes. In GCC
g++ -E -dM <file>
I would bet it is possible in nearly all compilers.
Boost Wave (a preprocessor library that happens to include a command line driver) includes a tracing capability to trace macro expansions. It's probably a bit more than you're asking for though -- it doesn't just display the final result, but essentially every step of expanding a macro (even a very complex one).
The clang preprocessor is somewhat similar. It's also basically a library that happens to include a command line driver. The preprocessor defines a macro_iterator type and macro_begin/macro_end of that type, that will let you walk the preprocessor symbol table and do pretty much whatever you want with it (including printing out the symbols, of course).

Resources