How does compiler link in libraries in your code? - c

So I'm going through CS50 introduction course.
And I'm confused about the fact that if you write in IDE #include some header file, which tells computer to find some library, and your compiler will find that code and combine it with your code.
But then how does compiler find that code? Like I just type in #include <cs50.h> for example. But how does it find that code when I don't have it on my PC? Why would I have it without finding it online and downloading it beforehand? Does it look online and then download file, which it uses now and in future for my programs when I call #include? And if so does it mean that you can't properly program without connection to internet?

And im confused about the fact that if you write in ide #include some header file which tells computer to find some library,
Be careful with terminology. #include tells the compiler to find some header. The word library in C usually refers to the file with compiled code (.a, .lib, .so, .dll).
and your compiler will find that code and combine it with your code.
This is right. By and large, the effect is the same as copying and pasting the contents of the header in the place of the #include statement.
But then how does compiler find that code? Like i just type in #include Cs50 for example.
The compiler has default search paths built in, where it looks for the header that is being #included. Typically directories like /usr/include and /usr/local/include are built into the compiler, as well as the current directory .. You can add more search paths through compiler command line arguments, for example -I/some/path in gcc and clang.
But how does it find that code when i dont have it on my pc
It doesn't. You will get an error like "Cs50: No such file or directory".
If this works on your university's system, probably the header is installed on that system in some central location.
why would i have it without finding it online and downloading it beforehand? Does it look online and then download file which it uses now and in future for my programs when i call #include?
It doesn't.
And if so does it mean that you cant properly program without connection to internet?
C was developed in the 1970s. The internet barely existed yet. You can program perfectly well in C without an internet connection – if you can do it without StackOverflow, of course ;)

The code of the library must be present on your computer or nothing will work. There's no such thing as magic downloads of libraries, C compilers and linkers have worked the same since long before the Internet was even invented.
Standard library headers are typically downloaded & installed along with the compiler. They are also very likely already pre-linked into some convenient format for the target system. You don't need to worry about manually adding standard libs to your project since the compiler will take care of that for you.
Custom headers require their corresponding .c files or linked libs to be present too, but they must be manually added to the current project. Either by adding them in your IDE's project, or as in the old days by creating a make file.
How that works in CS50 I don't know, but the lib obviously comes pre-installed somehow and they have hidden how to the students. We wouldn't want to risk CS50 students actually learning how programming works, now would we...

if you write in ide #include some header file which tells computer to find some library
Slight misunderstanding. When you type #include "somefile.h" into your program, the first stage of C and C++ compilation called the pre-processor will search for that file (called a "header") from the INCLUDE path. That is, the pre-processor will search through the local directory of the .c file, then a specified set of standard directories, and perhaps your own project directories to find a file called "somefile.h". The INCLUDE path is highly configurable with IDEs, command lines, and environment variables. Upon finding that file, the result is that the contents of that file are virtually substituted directly into your source code, exactly where the #include statement originally appeared. It's as if you had typed that exact file contents yourself into your .c file. After the textual substitution of the #include statements with the file contents, the intermediate file contents are handed off to the compiler stage to convert to object (assembly) code.
Like i just type in #include Cs50 for example. But how does it find that code when i dont have it on my pc
If the file can't be found, the pre-processor stage of the compile will fail and the whole thing comes to an end. I'm not sure what's in Cs50, but if you type #include "Cs50" and it works, then my guess is that your university environment has some project or environment configuration that adds a course specific include directory into your compilation path. Difficult to say since you didn't specify how you were building your code. Most IDEs will let you right-click on a #include statement and navigate to the actual source of the header file so you can inspect its contents and see where it originates from on your disk.
Also, your title says "linking", but you really mean "including". Linking is final stage after the compiler compiles all source files to object code to produce a final executable program.

.h files (should only) contain data type definitions, extern declarations, and function prototypes. .h files are not libraries.
So the compiler will know what parameters function take and what is the return type. The compiler will know how to call them. It will also know the type of variables defined somewhere else in the code (including the libraries). When the compiler compiles the code it generates intermediate files called object files.
Those files are later linked together by a special program called linker. This program will find the actual code of the functions in other object files or libraries (Libraries are basically sets of object files grouped in one larger file). It happens behind the scenes. If you need to tell the linker to use specific library you simply use command line option. For example to use math library you need to use -lm compiler command line option.
But how does it find that code when I don't have it on my PC?
The library file has to be present in your file system. Otherwise linker will not be able to link functions or variables from that library.
And if so does it mean that you can't properly program without
connection to internet?
No, it means that you have to have properly configured toolchain (ie all libraries needed present in your file system).
But then how does compiler find that code?
For your level of knowledge: some libraries are linked by default. Other not - so you need to tell the compiler/linker what you want to use.
--gcc & binututils related--
Linking is generally quite a complicated process and the compiler has its own configuration files called "spec files" and linker has "linker scripts". But explaining what those files do is rather an advanced topic far beyond the scope of this question.

Related

Where are the header functions defined? [duplicate]

When I include some function from a header file in a C++ program, does the entire header file code get copied to the final executable or only the machine code for the specific function is generated. For example, if I call std::sort from the <algorithm> header in C++, is the machine code generated only for the sort() function or for the entire <algorithm> header file.
I think that a similar question exists somewhere on Stack Overflow, but I have tried my best to find it (I glanced over it once, but lost the link). If you can point me to that, it would be wonderful.
You're mixing two distinct issues here:
Header files, handled by the preprocessor
Selective linking of code by the C++ linker
Header files
These are simply copied verbatim by the preprocessor into the place that includes them. All the code of algorithm is copied into the .cpp file when you #include <algorithm>.
Selective linking
Most modern linkers won't link in functions that aren't getting called in your application. I.e. write a function foo and never call it - its code won't get into the executable. So if you #include <algorithm> and only use sort here's what happens:
The preprocessor shoves the whole algorithm file into your source file
You call only sort
The linked analyzes this and only adds the source of sort (and functions it calls, if any) to the executable. The other algorithms' code isn't getting added
That said, C++ templates complicate the matter a bit further. It's a complex issue to explain here, but in a nutshell - templates get expanded by the compiler for all the types that you're actually using. So if have a vector of int and a vector of string, the compiler will generate two copies of the whole code for the vector class in your code. Since you are using it (otherwise the compiler wouldn't generate it), the linker also places it into the executable.
In fact, the entire file is copied into .cpp file, and it depends on compiler/linker, if it picks up only 'needed' functions, or all of them.
In general, simplified summary:
debug configuration means compiling in all of non-template functions,
release configuration strips all unneeded functions.
Plus it depends on attributes -> function declared for export will be never stripped.
On the other side, template function variants are 'generated' when used, so only the ones you explicitly use are compiled in.
EDIT: header file code isn't generated, but in most cases hand-written.
If you #include a header file in your source code, it acts as if the text in that header was written in place of the #include preprocessor directive.
Generally headers contain declarations, i.e. information about what's inside a library. This way the compiler allows you to call things for which the code exists outside the current compilation unit (e.g. the .cpp file you are including the header from). When the program is linked into an executable that you can run, the linker decides what to include, usually based on what your program actually uses. Libraries may also be linked dynamically, meaning that the executable file does not actually include the library code but the library is linked at runtime.
It depends on the compiler. Most compilers today do flow analysis to prune out uncalled functions. http://en.wikipedia.org/wiki/Data-flow_analysis

Where are headers file present in Contiki?

I am trying to understand the IPv6-over-BLE UDP-client demo example which is present in examples/cc26xx/cc26xx-ble-client-demo, the code has following header files:
#include "contiki.h"
#include "contiki-lib.h"
#include "contiki-net.h"
#define DEBUG DEBUG_FULL
#include "net/ip/uip-debug.h"
#include "net/ip/uiplib.h"
#include "net/ipv6/uip-icmp6.h"
I just want to know the locations of these header files in the Contiki file system as the code for the main implementation of the BLE connection is under cpu/cc26xx-cc13xx/rf-core/*.[ch] and cpu/cc26xx-cc13xx/net/*.[ch]. I want to understand how the example code can use the methods present in files at different locations.
So you need to understand how an application is build.
All executable code is defined in C files and translated into machine code. Having said this, there might be modules written in other languages, the C runtime most probably has some assembler sources. We can call these "translation units" because they are each translated separately on their own.
The header files just contain declarations of objects implemented in those translation units. Header files can combine declarations of several units or deliberately leave out some declarations.
If you compile one of your own sources which includes a header file there will be no code of the referenced objects in your resulting object file.
At the link stage the linker combines all object modules of your program, resolving references between all of them. That means if you have a call in one unit to a method in another module, that call will receive the correct address.
There are still unsatisfied references left, in particular those to library methods that were declared in those header files. The linker searches through the default and the explicitely given libraries to look for the methods. If it finds one it will get added to the code and the call will receive its address.
This is it really very short. Please search for a broader description in the web, you'll find a lot of them.
To answer your explicit question: "How can the example code use the methods present in files at different locations?"
The linker adds the machine code of these methods to the machine code of your modules. The location of their source code is irrelevant. The linker knows the location of the standard libraries. If you use additional libraries, add them (and their path, if needed) to the command line.
If you have any further questions, edit your question, please.
contiki.h, contiki-lib.h, contiki-net.h: These files are in contiki/core/ folder.
uip-debug.h: contiki/core/net/ip
uiplib.h: contiki/core/net/ip
uip-icmp6.h: contiki/core/net/ipv6/
Please find the answer below.
https://github.com/contiki-os/contiki/tree/master/core
"contiki.h"
"contiki-lib.h"
"contiki-net.h"
https://github.com/contiki-os/contiki/blob/master/core/net/ip/uip-debug.h
You can find the header files in the Contiki in the same respective locations.

C - tcc finds error in commdlg.h?

I'm attempting to write a C program incorporating OPENFILENAME and of course I need the header file. So, following the instructions provided with tcc, I downloaded the header files MinGW uses for the Win32 API (and the library files) and put them in the appropriate directories as instructed. However, when I come to compile the program, I get the following error:
In file included from sw1.c:2:
c:/prg/tcc/include/winapi/commdlg.h:503: declaration list expected
This seems rather odd, given it's a standard header. So, I look up the line and it's typedef __AW(CHOOSECOLOR) CHOOSECOLOR,*LPCHOOSECOLOR;, which doesn't look very valid to me but then I'm not really a C expert and I've mainly been writing in Linux. I have no idea why it's going wrong though and no knowledge of how to fix it? Is it a bug in tcc?
As evidence that this should be possible, here is the appropriate passage from the tcc readme:
Header Files:
The system header files (except _mingw.h) are from the MinGW
distribution:
http://www.mingw.org/
From the windows headers, only a minimal set is included. If you need
more, get MinGW's "w32api" package.
I understand that this question is similar to Error when including Windows.h with TCC but my "windows.h" file does work - it's just this that doesn't.
Does anyone know how to solve this? I'm really at a loose end!

how can get lib functions bodies in C?

As you can see above,I want to know how library functions (like printf) are made in C. I am using the borlandC++ compiler.
They are defined in lib files (***.lib), header files only have prototypes.
Lib files cannot be read in text editors.
So, please let me know how they could read?
C is a compiled language, so the C source code gets translated to binary machine-language code.
Because of that, you can't see the actual source code of any given library you have.
If you want to know how it works, you can see if it's an open source library, find the source code of the particular revision that generated the version you're using, and read it.
If it's not open source, you could try decompiling - use a tool that tries to guess what the original source code could have been like for generating the machine code your library has. As you can guess, this is not an accurate process - compiling isn't an isomorphic process - and, as you probably wouldn't have guessed, it could be illegal - but I'm not really sure what conditions it depends on, if any.

Why use object files in C?

When I compile a C program, for ease I've been including the source file for a certain header at the end. So, if main.c includes util.h, util.h will have all the headers util.c will use, outlines types or structs, etc, then at the very end it include util.c. Then, when I compile I only have to use gcc main.c -o main, and the rest is all taken care of.
I've been looking up C coding standards, trying to figure out what the best way to do things is, and there are just so many, and so many conflicting opinions I don't know what to think. Why do so many places reccomend compiling object files individually instead of including all of them in a web? util never touches anything but util.c, so the two are perfectly independent, and in theory (my theory) it would be fine, but I'm probably wrong since this is computer science and people are wrong even when they're right, so if I'm already wrong I'm probably wrong.
Some people say header files should ONLY be prototypes, and the source file be the one that includes it, and it's necessary system headers. From purely as aesthetic point of view I much prefer having all the info (types, system headers used, prototypes) in the header (in this case util.h) and having ONLY function code in util.c (excluding one "#include "util.h"" at the very top).
I guess the point I'm getting at is, with all this stuff that works, selecting a method sounds arbitrary to someone who doesn't understand the background (me). Please tell me why and what.
While your program is small, this will work. At some point, however, your program will get large enough that recompiling the whole program every time you change one line is a pain in the rear.
This -- even more than avoiding editing huge files -- is the reason to split up your program. If main.c and util.c are seperately compiled into object files, changing one line in a function in main.c will no longer require you to recompile all the code in util.c.
By the time your program is made up of a few dozen files, this will be a big win.
I think the point is that you want to include only what is needed for that file to be independent. This reduces overall compilation times by allowing the compiler to only read the headers that are necessary rather repeatedly reading every header when it might not need to. For example, if your util.c method utilises functions and/or types in <stdio.h> but your util.h doesn't, then you would want to include <stdio.h> only in util.c so that when the compiler compiles util.c it only then includes <stdio.h>, but if you include <stdio.h> in your util.h instead, then every source file that includes util.h is also including <stdio.h> whether it needs it or not.
This is very negligible for small projects with only a handful of files, but proper header inclusion can affect compilation times for larger projects.
With regards to the question about "object files": when you compile a source file into an object file, you create a shortcut that allows a build system to only recompile the source files that have outdated object files. This is an effective way to significantly reduce compilation times especially for large projects.
First, including a .c file from a .h file is completely bass-ackwards.
The "standard" way of doing it follows a line of thought roughly like this:
You have a library, containing dozens of functions. Keeping everything in one big source file means that anyone using your library would have to link the whole library, even if he uses only a single function of it. (Imagine linking the whole C standard library for a puts( "Hello" ).)
So you split things across multiple source files, which are compiled individually. Whenever you make changes to one of your functions, you have to re-translate only one small source file and update the library archive (or executable) - instead of re-translating the whole thing every time. (This is still an issue, because code sizes have somewhat kept up with CPU improvements. Compiling something like the Boost lib can still take several minutes on not-too-fancy hardware...)
Now you are in a pinch, however. The function is defined inside the .c file, and the corresponding .o file can conveniently be linked (via a .a archive if need be). However, to actually address the function (provided by the .o file) properly from another source file (a.k.a. "translation unit"), your compiler needs to know the function name, its parameter list, and its return type. This is why the declaration of the function (i.e., the function head without its body) is put in a separate header (.h) file.
Other source files can now #include the header file, address the function properly (without the compiler being aware of what the function actually does), and when all parts of your library / program are compiled into .o files, then everything is linked together.
The source file includes its own header basically to make sure the two files agree on the function declaration. ;-)
That's about it, as far as I can be bothered to write it up right now. Putting everything into one monolithic source file is barely acceptable (actually, no, it isn't, not for anything beyond about 200 lines), but including the .c file at the end of the .h file either means you learned your C coding by looking at god-awful code instead of a good book, or whoever tutored you should never tutor another person on C coding in his life. No offense intended. ;-)
PS: Header files also provide a good summary / oversight of a piece of code. Languages that don't provide headers - Java, for example - need IDE's or documentation tools to extract this kind of information. Personally, I found header files to be a benefit, not a liability.
Please use *.h and *.c files as customary: *.h files are #included in *.c files; *.h contain only macro definitions, data type declarations, function declarations, and extern data declarations. All definitions are in *.c files. That is how everybody else organizes C programs, do your fellow humans (who some day might need to understand your program) a favor. If something in file.c is used outside, you'd write file.h containing the declarations of whatever in that file is to be used outside, and include that in file.c (to check that declarations and definitions agree) and in all using *.c files. If a bunch of *.h are always included together, it might mean that the splitup into *.c isn't right (or at least that of the *.h; perhaps you should make one .h including all those declarations, and creating *.h for internal use where needed among the group of related *.c files).
[If a program written as you outline crosses my path, I can assure you I'll avoid it like the plague. The extra obfuscation might be wellcome in IOCCC, but not by me. It is a sure sign of somebody who doesn't know how to organize a program cleanly, and so the program probably isn't worth trying it out.]
Re: Separate compilation: You break up a C program so the pieces are easier to understand, you can hide details of how things work in the C files (think static), this provides support for Parnas' modularity. It also means that if you change a file, you don't have to recompile everything.
Re: Differing C programming standards: Yes, there are lots of them around. Pick one you feel confortable with, and stick to that. If you work on a project, adhere to their standards.
The "include in a single translation unit" approach becomes very inefficient for any significantly sized project, it is impractical for projects that are distributed amongst multiple developers.
Morover when creating static libraries, if everything in the library were from a single translation unit, any code linked to it would get all the library code regardless of whether it is referenced or not.
A project using a build manager such as make or the features available in most IDEs uses header file dependencies to allow an incremental build; only compiling those sources that are modified or dependent on modified files. The dependencies are determined by the file inclusions, so minimising redundant dependencies speeds build time.
A typical commercial project can comprise hundreds of thousands of lines of code and a few hundred source files; full rebuild times can vary from minutes to hours. If in your development cycle you have to wait that long between code changes and test, productivity would be very low!

Resources