Two basic question about compiling and libraries - c

I have two semi-related questions.
My first question: I can call functions in the standard library without compiling the entire library by just:
#include <stdio.h>
How would I go about doing the same thing with my header files? Just "including" my plaintext header files obviously does not work.
#include "nameofmyheader.h"
Basically, how can I create a library that other files can call?
Second question: Suppose I have a program that is split into 50 c files and a header file. What is the proper way to compile it besides:
cc main.c 1.h 1.c 2.c 3.c 4.c 5.c 6.c 7.c /*... and so on*/
Please correct any misconceptions I am having. I'm totally lost here.

First, you're a bit confused as to what happens with an #include. You never "compile" the standard library. The standard library is already compiled and is sitting in library files (.dll and .lib files on Windows, .a and .so on Linux). What the #include does is give you the declarations needed to link to the standard library.
The first thing to understand about #include directives is that they are very low-level. If you have programmed in Java or Python, #includes are much different from imports. Imports tell the compiler at a high level "this source file requires the use of this package" and the compiler figures out how to resolve that dependency. An #include in C directive says "take the entire contents of this file and literally paste it in right here when compiling." In particular, #include <stdio.h> brings in a file that has the forward declarations for all of the I/O functions in the standard library. Then, when you compile your code, the compiler knows how to make calls to those functions and check them for type-correctness.
Once your program is compiled, it is linked to the standard library. This means that your linker (which is automatically invoked by your compiler) will either cause your executable to make use of the shared standard library (.dll or .so), or will copy the needed parts of the static standard library (.lib or .a) into your executable. In neither case does your executable "contain" any part of the standard library that you do not use.
As for creating a library, that is a bit of a complicated topic and I will leave that to others, particularly since I don't think that's what you really want to do based on the next part of your question.
A header file is not always part of a library. It seems that what you have is multiple source files, and you want to be able to use functions from one source file in another source file. You can do that without creating a library. All you need to do is put the declarations for things foo.c that you want accessible from elsewhere into foo.h. Declarations are things like function prototypes and "extern" variable declarations. For example, if foo.c contains
int some_global;
void some_function(int a, char b)
{
/* Do some computation */
}
Then in order to make these accessible from other source files, foo.h needs to contain
extern int some_global;
void some_function(int, char);
Then, you #include "foo.h" wherever you want to use some_global or some_function. Since headers can include other headers, it is usual to wrap headers in "include guards" so that declarations are not duplicated. For example, foo.h should really read:
#ifndef FOO_H
#define FOO_H
extern int some_global;
void some_function(int, char);
#endif
This means that the header will only be processed once per compilation unit (source file).
As for how to compile them, never put .h files on the compiler command line, since they should not contain any compile-able code (only declarations). In most cases it is perfectly fine to compile as
cc main.c 1.c 2.c 3.c ... [etc]
However if you have 50 source files, it is probably a lot more convenient if you use a build system. On Linux, this is a Makefile. On windows, it depends what development environment you are using. You can google for that, or ask another SO question once you specify your platform (as this question is pretty broad already).
One of the advantages of a build system is that they compile each source file independently, and then link them all together, so that when you change only one source file, only that file needs to be re-compiled (and the program re-linked) rather than having everything re-compiled including the stuff that didn't get changed. This makes a big time difference when your program gets large.

You can combine several .c files to a library. Those libraries can be linked with other .c files to become the executable.
You can use a makefile to create a big project.
The makefile has a set of rules. Each rule describes the steps needed to create one piece of the program and their dependencies with other pieces or source files.

You need to create a shared library, the standard library is a shared library that is implicitly linked in your program.
Once you have your shared library you can use the .h files and just compile the program with -lyourlib wich is implicit for the libc
Create one using:
gcc -shared test.c -o libtest.so
And then compile your program like:
gcc myprogram.c -ltest -o myprogram
For your second question I advise you to use Makefiles
http://www.gnu.org/software/make/

The standard library is already compliled and placed on your machine ready to get dynamically linked. This means that the library is dynamically loaded when needed by a program. Compare this to a static library which gets compiled INTO your program when you run the compiler/linker.
This is why you need to compile your code and not the standard library code. You could build a dynamic (shared) library yourself.
For reference, #include <stdio.h> does not IMPORT the standard library. It just allows the compile and link to see the public interface of the library (To know what functions are used, what parameters they take, what types are defined, what sizes they are, etc).
Dynamic Loading
Shared Library
You could split your files up into modules, and create shared libraries. But generally as projects get bigger you tend to need a better mechanism to build your program (and libraries). Rather than directly calling the compiler when you need to do a rebuild you should use a make program or a complete build system like the GNU Build System.

If you really want it to be as simple as just including a .h file, all of your "library" code needs to be in the .h file. However, in this scenario, someone can only include your .h file into one and only one .c file. That may be ok, depending on how someone will use your "library".

Related

Do C compilers remove duplicate common includes from included libraries?

I am new to C and am just learning the basics of modularising my code for neatness and maintainability. I am reading a lot of people saying not to include .c files directly but instead to use .h files with associated .c files.
My question is, when writing a library which is exposed/included via its .h file - does the compiler dedupe common includes or are the included each time they are referenced?
For instance in my above application, I am using printf in my main and also in my foo library.
When running:
gcc -o app main foo/foo.c && ./app
I get the expected outputs printed to the console, however does the compiler remove duplicates of the <stdio.h> include or is it included once for main.c and once again for foo.c?
No, the compiler does not remove them. Nor should it, because sometimes (although it's rare) headers are written with the purpose of being included several times with different effects each time. So the compiler can't just omit these subsequent inclusions.
That's why people put include guards in headers (#ifndef FOO_H_ in this case.)
Each file, regardless of whether is a .h or .c file, should include what it needs. It should not rely that a header has already been included somewhere else. If something is included twice in the current compilation unit, the include guards will make sure headers are only included once, regardless of how many files try to include them.
As a side note, #pragma once, even though it's not in the C standard, is a de-facto standard compiler extension. So you can use just do:
#pragma once
void foo();
It's one of those rare cases where a non-standard compiler extension is so widely supported that it's safe to use.
In contrary, each compilation unit ("main.c" and "foo.c" in your case) needs that include. Otherwise the compiler would not know the prototype of printf()(note). Each compilation unit (aka "module") is compiled on its own.
You might mix up headers and linkable files (object code files, and libraries).
The contents of a header file replaces the #include line during preprocessing. "stdio.h" contains only the prototype of printf(), among a lot of other stuff, not the implementation of the function.
If the compiler generates the object code for "main.c" and "foo.c", each of them includes an unresolved reference to printf().
Finally the linker will include the object code for printf(), but just once. This single instance of the function is called by both callers. Here happens what you seem to ask.
You might wonder why you don't have to add the library to your command line. This is a convenience feature of most compiler drivers, as nearly all applications want the standard libraries. You might like to add "-v" to the command line to see what really happens. Other options can suppress this automation.
Note: Some compilers are quite smart and know a lot of standard functions. They will accept the source and produce a nice warning. But don't rely on this.

What is the difference between stdio.c and stdio.h?

Couldn't stdio functions and variables be defined in header files without having to use .c files.
If not, what are .c files used for?
The functions defined in the header file have to be implemented. The .c file contains the implementation, though these have already been compiled into a static or shared library that your compiler can use.
The header file should contain a minimal description of the function to save time when compiling. If it included the entire source it'd force the compiler to rebuild it each and every time you compile which is really wasteful since that source never changes.
In effect, the header file serves as a cheat sheet on how to interact with the already compiled library.
The reason the .c files are provided is primarily for debugging, so your debugger can step through in your debug build and show you source instead of raw machine code. In rare cases you may want to look at the implementation of a particular function in order to better understand it, or in even more rare cases, identify a bug. They're not actually used to compile your program.
In your code you should only ever reference the header file version, the .h via an #include directive.
stdio.h is a standard header, required to be provided by every conforming hosted C implementation. It declares, but does not define, a number of entities, mostly library functions like putchar and scanf.
stdio.c, if it exists, is likely to be a C source file that defines the functions declared in stdio.h. There is no requirement that an implementation must make it available. It might not even exist; for example the implementations of the functions declared in stdio.h might appear in multiple *.c files.
The declaration of putchar is:
int putchar(int c);
and that's all the compiler needs to know when it sees a call to putchar in your program. The code that implements putchar is typically provided as machine code, and the linker's job is to resolve your putchar() call so it ends up invoking that code. putchar() might not even be written in C (though it probably is).
An executable program can be built from multiple *.c source files. One and only one copy of the code that implements putchar is needed for an entire program. If the implementation of putchar were in the header file, then it would be included in each separately compiled source file, creating conflicts and, at best, wasting space. The code that implements putchar() (and all the other functions in the library) only needs to be compiled once.
The .c files has specific function for any aim. For example stdio.c files has standart input-output functions to use within C program. In stdio.h header files has function prototypes for all stdio.c functions, all defines, all macros etc. When you #include <stdio.h> in your main code.c file your main code assumes there is a " int printf(const char *format, ...)" function. Returns int value and you can pass argument ..... etc. When you call printf() function actually you use stdio.c files..
There are languages where if you want to make use of something someone else has written, you say something like
import module
and that takes care of everything.
C is not one of those languages.
You could put "library" source code in a file, and then use #include to pull it in wherever you needed it. But this wouldn't work at all, for two reasons:
If you used #include to pull it in from two different source files, and then linked the two resulting object files together, everything in the "library" would be defined twice.
You might not want to deliver your "library" code as source; you might prefer to deliver it in compiled, object form.

How to correctly include own libraries in function files and project files

I got stuck trying to do Exercise 8-3 of K&R, the goal of the exercise is to rewrite some functions of stdio.h such as fopen, fclose, fillbuf and flushbuf
here's how my source files are organized:
stdio.h: contains types and macro definitions, and the declarations of some functions proper to the library. all content of the file is enclosed between #ifndef #endif lines as follows:
#ifndef STDIO_H
#define STDIO_H
/* content of stdio.h */
#endif
myfunction.c: I have a .c file per function, each file has a #include "stdio.h" line to load all needed types definitions.
main.c: where I have code to test my functions, the main.c also has a #include "stdio.h" line.
my problem is the following: when I try to compile all my files using gcc I run to the error:
multiple definition of `_iob'
on every one of my function files where my stdio.h is included, (_iob is a variable I only defined inside my stdio.h)...why is this happening ? I though the #ifndef line was to specifically prevent such errors.
more generally:
How would you go about making your own header files and library/function files and using them in your projects ?
Is there a way to make the linker figure out the position of my functions just by including the header file, the same way it does for standard functions ?
Please become aware of the difference between a library and its header files.
A library is a (collection of) binary machine code (with some additional meta-data, e.g. relocation directives to the linker).
For example, on my Linux system, dynamic libraries are generally shared objects (e.g. /usr/lib/x86_64-linux-gnu/libgmp.so) and it makes absolutely no sense to try some preprocessor directive like #include "libgmp.so" //wrong.
But a library has some API. That API is given by some documentation and by some header file(s), e.g. gmp.h and you should #include "gmp.h" in any C code (your C translation unit) which uses it.
myfunction.c: I have a .c file per function
Having one file per function is often poor taste. You generally can group related functions. For example, in your case, you probably want to define your myfopen and myfclose functions in the same myopenclose.c translation unit (even if you don't have to) because these two functions are intimately related. As a rule of thumb, I prefer having source files of one or a few thousand lines each (but that is really a matter of taste, and some people like having many small files).
Remember that what the compiler really sees is the preprocessed form of code. Consider asking your compiler to produce that form (e.g. from foo.c you can get its preprocessed form foo.i with gcc -C -E -Wall foo.c > foo.i on my Linux desktop) and look into it. Try that on your own files (e.g. your myopenclose.c if you have one).
If you have many small files, the compiler is probably including the same headers in each of them, and these included declarations gets compiled every time. BTW, notice that gcc is only a driver program. Use it with -v flag. You'll see that it is running cc1 (the C compiler proper), as (the assembler), ld (the linker), etc.
I run to the error:
multiple definition of `_iob'
on every one of my function files where my stdio.h is included, (_iob is a variable I only defined inside my stdio.h).
You probably should declare extern your _iob global variable in your stdio.h and define a global _iob in only one implementation file (perhaps myopenclose.c, if it is relevant) of your library.
Don't confuse definition and declaration (of variables, functions, types, etc.). Spend some time reading the C11 standard n1570. These words are defined there. As a rule of thumb, declarations should go into header .h files, definitions (of variables and functions) in implementation .c files (of course details are much more complex, you often but not always define types and struct in header files).
I strongly recommend using some Linux distribution (it is very developer- and student- friendly) and studying the source code of some existing free software C standard library (like musl-libc, whose code is quite readable). More generally, study the source code of existing free software projects (e.g. on github). They will inspire you.
Is there a way to make the linker figure out the position of my functions just by including the header file, the same way it does for standard functions ?
This shows a lot of confusion (the above question does not make any sense). Read more about compilers (your cc1 program -started by gcc- is translating a .c file into some object file .o) and about linkers (your ld, generally started by gcc, is agglomerating several object files, processing relocations inside them, and producing an ELF library or an executable). The preprocessing (e.g. of #include directive) is done at compile time by cc1. The linker cannot see any header files (it only deals with object files or libraries).
If you rewrite some of the system declarations and functions, while at the same time including the system declarations, you can expect some collisions.
Header files (.h) contain code (usually only declarations) and the mechanism you describe (#ifndef STDIO_H) is to prevent multiple inclusions of the same header file - mainly because another include file (header) that has already been loaded might also include it. That result in the same kind of collision as you had.
In C, you could, for instance
make a new header file that contain your own declarations + the stdio ones that don't collide with yours
use the stdio declarations, and only write new functions that use the same structures, defines, enums etc... as stdio
rewrite the necessary declarations and code that allows you not to include the system headers anymore
use another naming convention, like my_iob in both your header file, and in your code.
The two last ones are probably the best in your case, since you still have some collisions coming from a header file.
For instance, your code might not include stdio.h, but another header file you include might do it, indirectly...

C questions about static and shared libraries

I have some .c and .h files with the main function encapsulated in the MAIN_FUNC.c. I need to pass them to a guy who is going to integrate the MAIN_FUNC() with his files.
However since my algorithm is confidential I can't just send the .c and .h files and so I've been looking into static and shared libraries. However I still have some doubts.
1: In every tutorial that I've seen the .hs are needed as well. Is there any way that I can send the guy just one single library file that he can #include in his code?
2: Even if I have to pass the .hs files, do i really need to pass all of them? How can I give him only the libMAIN_FUNC.a and the MAIN_FUNC.h?
3: With the .a or .so libraries, is there any way of reverse engineering the files so that one can see the .c and .h code?
No, you must provide him with at least one .h file.
No, you need to pass only those that are sufficiently define interface between your library and user. I suggest you to read about pimpl paradigm
Theoretically yes, your .a and .so files can be reverse-engineered, but it is very nontrivial.
My understanding is that you have a .c and .h file that someone else will be implementing, but you want to keep your code confidential.
If your only concern is handing out source code, then there is always the option of partial compilation. If you have gone through the trouble of making sure your code works without issue, you can partially compile your program into a .o file.
I don't know the details of your code, but if this other person you've mentioned will just be implementing your functions like a library, then the .o is all he would need.
sample makefile for your end:
all: MAIN_FUNC.o
MAIN_FUNC.o: MAIN_FUNC.c MAIN_FUNC.h
gcc -c MAIN_FUNC.c
sample makefile for other guy's end:
all: main
main: main.c main.h MAIN_FUNC.o
gcc -o main.c MAIN_FUNC.o main
A lot of companies do this sort of thing in order to protect their property. When one company sells software to another, they oftentimes sell these .o files. You would only need to provide the knowledge of what the function does (i.e. "This function takes an input from the console and returns the number of words written as an integer")--something basic that would allow the implementation of your work without revealing your source code.
Edit: fixed a typo
First things first, on reverse engineering. Given infinite time and resources, your code can always be reverse engineered. Having said that, your objective is to make it impractical for others to reverse engineer your code.
Now to answer your question:
Generating an executable binary from c code happens in "two major" steps. Compiling and Linking.
After compiling, your files.c become object files (machine code). They are not executable yet.
If you have two files: file1.c and file2.c, you will get file1.o and file2.o for example.
Now, the code in file1.c may be calling a function which exists in file2.o. At compilation stage, all what file1.c needs to know is the function prototype.
When the linker is invoked to generate the executable binary, it makes sure that the function called from file1.o exists somewhere, such as in file2.o.
How this affects you:
The header file should not be proprietary (but perhaps it is for legal reasons). The header file is mainly used to tell other .c files what functions and return values to expect (declaration, not implementation).
Now perhaps you have some proprietary function prototypes for whatever reason which you don't want to expose to the world. Say you want the world to start your code by calling the function
start_magic();
Then, what you do is:
Provide a header file: magic.h to be included in the main.c
header file will have the function: void start_magic();
You then put your proprietary code in algo.c and algo.h
algo.c will have start_magic() implementation
algo.c will include proprietary algo.h
Now what you can do is compile (no linking) your algo.c file, and strip the debugging symbols to make it hard to reverse engineer. How this is done depends on the compiler you are using.
Now you can provide the object file and the header file to somebody who wants to call the function start_magic().
The implementer of main has to link the program using the object file you provided.
Example
Assume you have algo.c with your algorithms. Let us say algo.c has the function:
float sqrt(float x){
taylor_approx(x);
}
Suppose that sqrt function will be shared with supplier. However, sqrt function calls on proprietary function taylor_approx(x) to calculate the square root.
You can create an algo.h file to be sent to the users, which contains:
extern float sqrt(float x);
Then you can send your -stripped from debugging symbols- compiled object file, for example, algo.o, to the users and ask them to put algo.h in their main.c
Note that this is one way to do it.
1) You can compile your C file into a library (*.a or whatever), given it is written properly and distribute it along with the h file. you have to give the h files as they are the interface to your library, which is just a binary blob otherwise.
2) You need to pass the headers declaring the public interface your library is exporting. I.e. the functions and symbols you want the user of the library to have access to.
3) Yes, there is always way of reverse engineering of just anything. The only question is the gain/effort ratio.

Do I need to compile the header files in a C program?

Sometimes I see someone compile a C program like this:
gcc -o hello hello.c hello.h
As I know, we just need to put the header files into the C program like:
#include "somefile"
and compile the C program: gcc -o hello hello.c.
When do we need to compile the header files or why?
Firstly, in general:
If these .h files are indeed typical C-style header files (as opposed to being something completely different that just happens to be named with .h extension), then no, there's no reason to "compile" these header files independently. Header files are intended to be included into implementation files, not fed to the compiler as independent translation units.
Since a typical header file usually contains only declarations that can be safely repeated in each translation unit, it is perfectly expected that "compiling" a header file will have no harmful consequences. But at the same time it will not achieve anything useful.
Basically, compiling hello.h as a standalone translation unit equivalent to creating a degenerate dummy.c file consisting only of #include "hello.h" directive, and feeding that dummy.c file to the compiler. It will compile, but it will serve no meaningful purpose.
Secondly, specifically for GCC:
Many compilers will treat files differently depending on the file name extension. GCC has special treatment for files with .h extension when they are supplied to the compiler as command-line arguments. Instead of treating it as a regular translation unit, GCC creates a precompiled header file for that .h file.
You can read about it here: http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html
So, this is the reason you might see .h files being fed directly to GCC.
Okay, let's understand the difference between active and passive code.
The active code is the implementation of functions, procedures, methods, i.e. the pieces of code that should be compiled to executable machine code. We store it in .c files and sure we need to compile it.
The passive code is not being execute itself, but it needed to explain the different modules how to communicate with each other. Usually, .h files contains only prototypes (function headers), structures.
An exception are macros, that formally can contain an active pieces, but you should understand that they are using at the very early stage of building (preprocessing) with simple substitution. At the compile time macros already are substituted to your .c file.
Another exception are C++ templates, that should be implemented in .h files. But here is the story similar to macros: they are substituted on the early stage (instantiation) and formally, each other instantiation is another type.
In conclusion, I think, if the modules formed properly, we should never compile the header files.
When we include the header file like this: #include <header.h> or #include "header.h" then your preprocessor takes it as an input and includes the entire file in the source code. the preprocessor replaces the #include directive by the contents of the specified file.
You can check this by -E flag to GCC, which generates the .i (information file) temporary file or can use the cpp(LINUX) module specifically which is automatically used by the compiler driver when we execute GCC.
So its actually going to compile along with your source code, no need to compile it.
In some systems, attempts to speed up the assembly of fully resolved '.c' files call the pre-assembly of include files "compiling header files". However, it is an optimization technique that is not necessary for actual C development.
Such a technique basically computed the include statements and kept a cache of the flattened includes. Normally the C toolchain will cut-and-paste in the included files recursively, and then pass the entire item off to the compiler. With a pre-compiled header cache, the tool chain will check to see if any of the inputs (defines, headers, etc) have changed. If not, then it will provide the already flattened text file snippets to the compiler.
Such systems were intended to speed up development; however, many such systems were quite brittle. As computers sped up, and source code management techniques changed, fewer of the header pre-compilers are actually used in the common project.
Until you actually need compilation optimization, I highly recommend you avoid pre-compiling headers.
I think we do need preprocess(maybe NOT call the compile) the head file. Because from my understanding, during the compile stage, the head file should be included in c file. For example, in test.h we have
typedef enum{
a,
b,
c
}test_t
and in test.c we have
void foo()
{
test_t test;
...
}
during the compile, i think the compiler will put the code in head file and c file together and code in head file will be pre-processed and substitute the code in c file. Meanwhile, we'd better to define the include path in makefile.
You don't need to compile header files. It doesn't actually do anything, so there's no point in trying to run it. However, it is a great way to check for typos and mistakes and bugs, so it'll be easier later.

Resources