Get preprocessed C code altogether with library implementation

Get preprocessed C code altogether with library implementation - c

I use *nix/gcc. I am beginner in C so this may be an easy question. Let's say I have a file main.c with the following content:
#include<stdio.h>
int main(int argc, char *argv[])
{
printf("Hello World.\n");
return 0;
}
Now what I want to get is one file which contains all code after preprocessing and all library implementation code (for inspection purpose). Now, if I run
gcc -E main.c >> full.c
I do get bunch of code, but there are only data types and function prototypes. Is it possible somehow to also get implementation of all functions, so that I can see whole code in one place.
Thanks.

The C Preprocessor just does text manipulation. In this case what it will do is to generate an output file that will contain the contents of the include file stdio.h along with the couple of lines of your main program.
It will not contain the stdio.h function implementations as that source is in some other files which are not provided in source code format with gcc.
So if you want the implementation you will need to find the gcc Standard C library source code which I expect is on the web some place.
However if you want to understand the Standard C library you should start with Plaughers books on the Standard C library or something similar which provides implementation as well as annotations and notes about what is done and why.

Related

use variables from a C generated file

I have generated a C file which contains some arrays with their size as here:
nor.c (generated file)
const float nor[] = {
-0.0972819, 0.906339, -0.4112, -0.056471, 0.340899, -0.938402, -0.284611, 0.269598, -0.919953, 0.04142, -0.149024, -0.987966, 0.12454, -0.702485, -0.700717, 0.0027959, -0.188166, -0.982133 };
const unsigned int nor_size= 18;
when I called this array from my main file as shown here:
extern float nor[nor_size];
I got some error: cannot use nor_size as a const
how to solve this?
thanks

how to solve this?
The best would be for the generator of the nor.c file to also generate a corresponding nor.h file
#ifndef NOR_INCLUDE_GUARD
#define NOR_INCLUDE_GUARD
extern const float nor[18];
extern const unsigned int nor_size;
#endif
main.c uses that .h file
// Rather than
// extern float nor[nor_size];
// Include
#include "nor.h"
Also recommend the generator of the nor.c fil to append an f suffix to each constant.

In your main.c remove extern float nor[nor_size]; then add
extern const float nor[];
extern const unsigned int nor_size;
and use them as appropriate. Indeed, nor_size might then be loaded (in processor registers at the machine code level) several times.
For more, read this C reference, and later some C standard like n1570 or better.
If you are allowed to compile all your C (both handwritten and generated) code with some recent GCC compiler (or cross-compiler), invoke it at least with -Wall -Wextra -g options.
don't forget to read the documentation of your C compiler!
You also need to understand what is your linker, and read its documentation. Maybe it could be GNU binutils.
You could use some static source code analyzer, like Clang static analyzer, or Frama-C, or Bismon. FYI, Bismon has 24604 lines of generated C code, and the Bigloo compiler has a lot of them too.
for Frama-C or Bismon, contact me by email to basile.starynkevitch#cea.fr and mention the URL of your question
Read also a good C programming book, like Modern C
In 2021, you could be interested by the DECODER project. It is strongly related to your question.
An interesting book describing a software which generates all the half-million lines of code of its C source (available here) is Pitrat's Artificial Beings: the Conscience of a Conscious Machine (ISBN 978-1848211018). You could enjoy reading that book.
See also this answer about generating C code.
You could also be interested by GNU bison. It is a free software parser generator, mostly written in C, and generates C code. My recommendation is to download its source code and take inspiration from it.
By experience, when you are generating C code, it is easier to improve the generator (so that the generated C code compiles cleanly without warnings) that to change the generated C code. But do document well your C code generator. Read about partial evaluation techniques.

What is the difference between stdio.c and stdio.h?

Couldn't stdio functions and variables be defined in header files without having to use .c files.
If not, what are .c files used for?

The functions defined in the header file have to be implemented. The .c file contains the implementation, though these have already been compiled into a static or shared library that your compiler can use.
The header file should contain a minimal description of the function to save time when compiling. If it included the entire source it'd force the compiler to rebuild it each and every time you compile which is really wasteful since that source never changes.
In effect, the header file serves as a cheat sheet on how to interact with the already compiled library.
The reason the .c files are provided is primarily for debugging, so your debugger can step through in your debug build and show you source instead of raw machine code. In rare cases you may want to look at the implementation of a particular function in order to better understand it, or in even more rare cases, identify a bug. They're not actually used to compile your program.
In your code you should only ever reference the header file version, the .h via an #include directive.

stdio.h is a standard header, required to be provided by every conforming hosted C implementation. It declares, but does not define, a number of entities, mostly library functions like putchar and scanf.
stdio.c, if it exists, is likely to be a C source file that defines the functions declared in stdio.h. There is no requirement that an implementation must make it available. It might not even exist; for example the implementations of the functions declared in stdio.h might appear in multiple *.c files.
The declaration of putchar is:
int putchar(int c);
and that's all the compiler needs to know when it sees a call to putchar in your program. The code that implements putchar is typically provided as machine code, and the linker's job is to resolve your putchar() call so it ends up invoking that code. putchar() might not even be written in C (though it probably is).
An executable program can be built from multiple *.c source files. One and only one copy of the code that implements putchar is needed for an entire program. If the implementation of putchar were in the header file, then it would be included in each separately compiled source file, creating conflicts and, at best, wasting space. The code that implements putchar() (and all the other functions in the library) only needs to be compiled once.

The .c files has specific function for any aim. For example stdio.c files has standart input-output functions to use within C program. In stdio.h header files has function prototypes for all stdio.c functions, all defines, all macros etc. When you #include <stdio.h> in your main code.c file your main code assumes there is a " int printf(const char *format, ...)" function. Returns int value and you can pass argument ..... etc. When you call printf() function actually you use stdio.c files..

There are languages where if you want to make use of something someone else has written, you say something like
import module
and that takes care of everything.
C is not one of those languages.
You could put "library" source code in a file, and then use #include to pull it in wherever you needed it. But this wouldn't work at all, for two reasons:
If you used #include to pull it in from two different source files, and then linked the two resulting object files together, everything in the "library" would be defined twice.
You might not want to deliver your "library" code as source; you might prefer to deliver it in compiled, object form.

Are libraries in C statically linked?

I'm starting to program in Rust and one of the first things I noticed is that Rust produces large binaries. For example, Rust's "Hello world!" binary is ~600K large, while the equivalent C binary is ~8K large.
After some searching I found this SO post which explains that Rust binaries are large because all the needed libraries are statically linked. But isn't that the case for C as well? When I write #include <stdio.h> in C, aren't I statically linking the relevant I/O libraries as well? I have always assumed that answer is 'yes' but now I am doubting myself.

#include copies the file contents to the source file, but if the header is nothing more than function declarations, all that would do is tell the program that those functions are available to be called in your code. The actual implementation may be defined in another file that would need to be linked in (either statically or dynamically) to your executable. If you look at the header for stdio.h you would see that it only contains function prototypes.
Many compilers provide options to do either static or dynamic linking for the standard libraries.

About C compilation process and the linking of libraries

Are C libraries linked with object code or first with source code so only later with object code? I mean, look at the image found at Cardiff School of Computer Science & Informatics's website
:
It's "strange" that after generating object-code the libraries are being linked. I mean, we use the source code while putting the includes!
So.. How this actually works? Thanks!

That diagram is correct.
When you #include a header, it essentially copies that header into your file. A header is a list of types and function declarations and constants, etc., but doesn't contain any actual code (C++ and inline functions notwithstanding).
Let's have an example: library.h
int foo(int num);
library.c
int foo(int num)
{
return num * 2;
}
yourcode.c
#include <stdio.h>
#include "library.h"
int main(void)
{
printf("%d\n", foo(100));
return 0;
}
When you #include library.h, you get the declaration of foo(). The compiler, at this point, knows nothing else about foo() or what it does. The compiler can freely insert calls to foo() despite this. The linker, seeing a call to foo() in youcode.c, and seeing the code in library.c, knows that any calls to foo() should go to that code.
In other words, the compiler tells the linker what function to call, and the linker (given all the object code) knows where that function actually is.
(Thanks to Fiddling Bits for corrections.)

Includes from libraries normally contain only library interface - so in the simplest case the .h file provided with the library contains function declaration, and the compiled function is in the library file. So you compile the sources with provided library functions declarations from library headers, and then linker adds the compiler library functions to your executable.

It might be instructive to look at what each piece in the tool-chain does, so using the boxes in your image.
pre-processor
This is really a text-editor doing a bunch of substitutions (ok, really really oversimplified). Some of the things that the pre-processor does is:
performs simple textual based substitution on #defines. So if we have #define PI 3.1415 in our file and then later on we have a line such as angle = angle * PI / 180; the pre=processor will convert this line into angle = angle * 3.1414 / 180;
anytime we encounter an #include, we can imagine that the pre-processor goes and gets the entire contents of that file and pastes the contents on the file on to where the #include is. (and then we go back and perform the substitutions.
we can also pass options to the compiler with the #pragma directive.
Finally, we can see the results of running the pre-processor by using the -E option to gcc.
compiler
The output of the pre-processor is still text, and it not contains everything that the compiler needs to be able to process the file. Now the compiler does a lot of things (and I normally break the box up when I describe this process). The compiler will process the text, do a lexical analysis of it, pass it to the parser that verifies that the program satisfies the grammar of the language, output an intermediate representation of the language, perform optimization and produce assembly code.
We can see the results of running up to the assembler by using the -s option to gcc.
assembler
The output of the compiler is an assembly listing, which is then passed to an assembler (most commonly `gas' (GNU assembler) on Linux), that converts the assembly code into machine code. In addition, on task of the assembler is to build a list of undefined referenced (i.e. a library function of a function that you wrote that is implemented in another source file.)
We can see the results of getting the output of the assembler by using the -c option to gcc.
linker
The input to the linker will be the output from the assembler (typically called object files and use an extention 'o'), as well as various libraries. Conceptually, the linker is responsible for hooking everything together, including fixing up the calls to functions that are found in libraries. Normally, the program that performs the linking in Linux is ld, and we can see the results of linking just by running gcc without any special command line options.
I have simplified the discussion of the linker, I hope I gave you a flavor of what the linker does.
The only issue that I have with the image you referenced, is that I would have move the phase "Object Code" to immediately below the assembler box, and at the same time I would move the arrow labeled "Libraries" down. I feel that this would indicate that the object code from the assembler is combined with libraries and these are combined by the linker to make an executable.

The Compilation Process of C with

Two basic question about compiling and libraries

I have two semi-related questions.
My first question: I can call functions in the standard library without compiling the entire library by just:
#include <stdio.h>
How would I go about doing the same thing with my header files? Just "including" my plaintext header files obviously does not work.
#include "nameofmyheader.h"
Basically, how can I create a library that other files can call?
Second question: Suppose I have a program that is split into 50 c files and a header file. What is the proper way to compile it besides:
cc main.c 1.h 1.c 2.c 3.c 4.c 5.c 6.c 7.c /*... and so on*/
Please correct any misconceptions I am having. I'm totally lost here.

First, you're a bit confused as to what happens with an #include. You never "compile" the standard library. The standard library is already compiled and is sitting in library files (.dll and .lib files on Windows, .a and .so on Linux). What the #include does is give you the declarations needed to link to the standard library.
The first thing to understand about #include directives is that they are very low-level. If you have programmed in Java or Python, #includes are much different from imports. Imports tell the compiler at a high level "this source file requires the use of this package" and the compiler figures out how to resolve that dependency. An #include in C directive says "take the entire contents of this file and literally paste it in right here when compiling." In particular, #include <stdio.h> brings in a file that has the forward declarations for all of the I/O functions in the standard library. Then, when you compile your code, the compiler knows how to make calls to those functions and check them for type-correctness.
Once your program is compiled, it is linked to the standard library. This means that your linker (which is automatically invoked by your compiler) will either cause your executable to make use of the shared standard library (.dll or .so), or will copy the needed parts of the static standard library (.lib or .a) into your executable. In neither case does your executable "contain" any part of the standard library that you do not use.
As for creating a library, that is a bit of a complicated topic and I will leave that to others, particularly since I don't think that's what you really want to do based on the next part of your question.
A header file is not always part of a library. It seems that what you have is multiple source files, and you want to be able to use functions from one source file in another source file. You can do that without creating a library. All you need to do is put the declarations for things foo.c that you want accessible from elsewhere into foo.h. Declarations are things like function prototypes and "extern" variable declarations. For example, if foo.c contains
int some_global;
void some_function(int a, char b)
{
/* Do some computation */
}
Then in order to make these accessible from other source files, foo.h needs to contain
extern int some_global;
void some_function(int, char);
Then, you #include "foo.h" wherever you want to use some_global or some_function. Since headers can include other headers, it is usual to wrap headers in "include guards" so that declarations are not duplicated. For example, foo.h should really read:
#ifndef FOO_H
#define FOO_H
extern int some_global;
void some_function(int, char);
#endif
This means that the header will only be processed once per compilation unit (source file).
As for how to compile them, never put .h files on the compiler command line, since they should not contain any compile-able code (only declarations). In most cases it is perfectly fine to compile as
cc main.c 1.c 2.c 3.c ... [etc]
However if you have 50 source files, it is probably a lot more convenient if you use a build system. On Linux, this is a Makefile. On windows, it depends what development environment you are using. You can google for that, or ask another SO question once you specify your platform (as this question is pretty broad already).
One of the advantages of a build system is that they compile each source file independently, and then link them all together, so that when you change only one source file, only that file needs to be re-compiled (and the program re-linked) rather than having everything re-compiled including the stuff that didn't get changed. This makes a big time difference when your program gets large.

You can combine several .c files to a library. Those libraries can be linked with other .c files to become the executable.
You can use a makefile to create a big project.
The makefile has a set of rules. Each rule describes the steps needed to create one piece of the program and their dependencies with other pieces or source files.

You need to create a shared library, the standard library is a shared library that is implicitly linked in your program.
Once you have your shared library you can use the .h files and just compile the program with -lyourlib wich is implicit for the libc
Create one using:
gcc -shared test.c -o libtest.so
And then compile your program like:
gcc myprogram.c -ltest -o myprogram
For your second question I advise you to use Makefiles
http://www.gnu.org/software/make/

The standard library is already compliled and placed on your machine ready to get dynamically linked. This means that the library is dynamically loaded when needed by a program. Compare this to a static library which gets compiled INTO your program when you run the compiler/linker.
This is why you need to compile your code and not the standard library code. You could build a dynamic (shared) library yourself.
For reference, #include <stdio.h> does not IMPORT the standard library. It just allows the compile and link to see the public interface of the library (To know what functions are used, what parameters they take, what types are defined, what sizes they are, etc).
Dynamic Loading
Shared Library
You could split your files up into modules, and create shared libraries. But generally as projects get bigger you tend to need a better mechanism to build your program (and libraries). Rather than directly calling the compiler when you need to do a rebuild you should use a make program or a complete build system like the GNU Build System.

If you really want it to be as simple as just including a .h file, all of your "library" code needs to be in the .h file. However, in this scenario, someone can only include your .h file into one and only one .c file. That may be ok, depending on how someone will use your "library".