What are the pros and cons of shoving everything in one file:
void function(void) {
code...
}
Versus creating a completely new file for functions:
#include <stdio.h>
#include "header.h"
Is one or the other faster? More lightweight? I am in a situation where speed is necessary and portability is a must.
Might I add this is all based on C.
If you care about speed, you first should write a correct program, care about efficient algorithms (read Introduction to Algorithms), benchmark & profile it (perhaps using gprof and/or oprofile), and focus your efforts mostly on the few percents of source code which are critical to performance.
You'll better define these small critical functions in common included header files as static inline functions. The compiler would then be able to inline every call to them if it wants to (and it needs access to the definition of the function to inline).
In general small inlined functions would often run faster, because there is no call overhead in the compiled machine code; sometimes, it might perhaps go slightly slower, because inlining increases machine code size which is detrimental to CPU cache efficiency (read about locality of reference). Also a header file with many static inline functions needs more time to be compiled.
As a concrete example, my Linux system has a header /usr/include/glib-2.0/glib/gstring.h (from Glib in GTK) containing
/* -- optimize g_string_append_c --- */
#ifdef G_CAN_INLINE
static inline GString*
g_string_append_c_inline (GString *gstring,
gchar c)
{
if (gstring->len + 1 < gstring->allocated_len)
{
gstring->str[gstring->len++] = c;
gstring->str[gstring->len] = 0;
}
else
g_string_insert_c (gstring, -1, c);
return gstring;
}
#define g_string_append_c(gstr,c) g_string_append_c_inline (gstr, c)
#endif /* G_CAN_INLINE */
The G_CAN_INLINE preprocessor flag would have been enabled by some previously included header file.
It is a good example of inline function: it is short (a dozen of lines), it would run quickly its own code (excluding the time to call to g_string_insert_c), so it is worth to be defined as static inline.
It is not worth defining as inline a short function which runs by itself a significant time. There is no point inlining a matrix multiplication for example (the call overhead is insignificant w.r.t. the time to make a 100x100 or 8x8 matrix multiplication). So choose carefully the functions you want to inline.
You should trust the compiler, and enable its optimizations (in particular when benchmarking or profiling). For GCC, that would mean compiling with gcc -O3 -mcpu=native (and I also recommend -Wall -Wextra to get useful warnings). You might use link time optimizations by compiling and linking with gcc -flto -O3 -mcpu=native
You need to be clear about the concepts of header files, translation units and separate compilation.
The #include directive does nothing more than insert the content of the included file at the point of inclusion as if it were all one file, so in that sense placing content into a header file has no semantic or performance difference than "shoving everything in one file".
The point is that is not how header files should be used or what they are intended for; you will quickly run into linker errors and/or code bloat on anything other than the most trivial programs. A header file should generally contain only declarative code not definitive code. Take a look inside the standard headers for example - you will find no function definitions, only declarations (there may be some interfaces defined as macros or possibly since C99, inline functions, but that is a different issue).
What header-files provide is a means to support separate compilation and linking of code in separate translation units. A translation unit is a source file (.c in this case) with all it's #include'ed and #define'ed etc. content expanded by the pre-processor before actual compilation.
When the compiler builds a translation unit, there will be unresolved links to external code declared in headers. These declarations are a promise to the compiler that there is an interface of the form declared that is defined elsewhere and will be resolved by the linker.
The conventional form (although there are few restrictions to stop you from dong unconventional or foolish things) of a multiple module C program source is as follows:
main.c
#include foobar.h
int main( void )
{
int x = foo() ;
bar( x ) ;
return 0 ;
}
foobar.h
#if !defined foobar_INCLUDE
#define foobar_INCLUDE
int foo( void ) ;
void bar( int x ) ;
#endif
Note the use of the pre-processor here to prevent multiple declarations when a file is included more than once which can happen in complex code bases with nested includes for example. All your headers should have such "include guards" - some compilers support #pragma once to do the same thing, but it is less portable.
foobar.c
#include "foobar.h"
int foo( void )
{
int x = 0 ;
// do something
return x ;
}
void bar( int x )
{
// do something
}
Then main.c and foobar.c (and any other modules) are separately compiled and then linked, the linker also resolves references to library interfaces provided by the standard library or any other external libraries. A library in this sense is simply a collection of previously separately compiled object code.
Now that is perhaps clear, to answer your question but re-present it as the pros and cons of separate compilation and linking the benefits are:
code reuse - you build your own libraries of useful routines that can be reused in many projects without erroneous copy & pasting.
Build time reduction - on a non-trivial application the separate compilation and linking would be managed by a build manager such as make or an IDE such as Ecipse or Visual Studio; these tools perform incremental builds compiling only those modules for which the source or one of it's header dependencies have been modified. This means you are not compiling all the code all the time so turn-around during debugging and testing is much faster.
Development team scalability - if all your code is in one file, it becomes almost impractical to have multiple developers working on the same project at once. If you want to work with others either on open-source projects or as a career (the two are not necessarily mutually exclusive of course), you really cannot consider the all-in-one approach. Not least because your fellow developers will not take toy seriously if that is your practice.
Specifically separate compilation and linking has zero impact on performance or code size under normal circumstances. There is possibly an impact on the ability of the compiler to optimise in some cases when it cannot see all of the code at one time, but if your code is carefully partitioned according to the principles of high cohesion and minimal coupling this potential loss of opportunity is probably insignificant. Moreover modern linkers are able to perform some cross-module optimisations such as unused code removal in any case.
Its not a question of which one is "faster". Header files are custom created when you have a function or functions which you'd want to use in a lot of other places or in other projects. For example, if you've written a function to calculate the factorial of a number and you'd want to use that function in other programs (or you find that you'd have to replicate the same code in other programs as well) then instead of writing the function in the other programs, it'll be more convenient if you'd put it in a header file. Generally, a header file contains functions which are relevant to a certain subject (like math.h contains functions for mathematical calculations and not for string processing).
Related
Background
I'm building a C application for an embedded Cortex M4 TI-RTOS SYS/BIOS target, however this question should apply to all embedded targets where a single binary is loaded onto some microprocessor.
What I want
I'd like to do some in situ regression tests on the target where I just replace a single function with some test function instead. E.g. a GetAdcMeasurement() function would return predefined values from a read-only array instead of doing the actual measurement and returning that value.
This could of course be done with a mess of #ifndefs, but I'd rather keep the production code as untouched as possible.
My attempt
I figure one way to achieve this would be to have duplicate symbol definitions at the linker stage, and then have the linker prioritise the definitions from the test suite (given some #define).
I've looked into using LD_PRELOAD, but that doesn't really seem to apply here (since I'm using only static objects).
Details
I'm using TI Code Composer, with TI-RTOS & SYS/BIOS on the Sitara AM57xx platform, compiling for the M4 remote processor (denoted IPU1).
Here's the path to the compiler and linker
/opt/ti/ccsv7/tools/compiler/ti-cgt-arm_16.9.6.LTS/bin/armcl
One solution could be to have multiple .c files for each module, one the production code and one the test code, and compile and link with one of the two. The globals and function signatures in both .c file must be at least the same (at least: there may be more symbols but not less).
Another solution, building on the previous one, is to have two libraries, one with the production code and one with the test code, and link with one of both. You could ieven link with both lubraries, with the test version first, as linkers often resolve symbols in the order they are encountered.
And, as you said, you could work with a bunch of #ifdefs, which would have the advantage of having just one .c file, but making tyhe code less readable.
I would not go for #ifdefs on the function level, i.e. defining just one function of a .c file for test and keeping the others as is; however, if necessary, it could be away. And if necessary, you could have one .c file (two) for each function, but that would negate the module concept.
I think the first approach would be the cleanest.
One additional approach (apart from Paul Ogilvie's) would be to have your mocking header also create a define which will replace the original function symbol at the pre-processing stage.
I.e. if your mocking header looks like this:
// mock.h
#ifdef MOCKING_ENABLED
adcdata_t GetAdcMeasurement_mocked(void);
stuff_t GetSomeStuff_mocked(void);
#define GetAdcMeasurement GetAdcMeasurement_mocked
#define GetSomeStuff GetSomeStuff_mocked
#endif
Then whenever you include the file, the preprocessor will replace the calls before it even hits the compiler:
#include "mock.h"
void SomeOtherFunc(void)
{
// preprocessor will change this symbol into 'GetAdcMeasurement_mocked'
adcdata_t data = GetAdcMeasurement();
}
The approach might confuse the unsuspected reader of your code, because they won't necessarily realize that you are calling a different function altogether. Nevertheless, I find this approach to have the least impact to the production code (apart from adding the include, obviously).
(This is a quick sum up the discussion in the comments, thanks for answers)
A function can be redefined if it has the weak attribute, see
https://en.wikipedia.org/wiki/Weak_symbol
On GCC that would be the weak attribute, e.g.
int __attribute__((weak)) power2(int x);
and on the armcl (as in my question) that would be the pragma directive
#pragma weak power2
int power2(int x);
Letting the production code consist of partly weak functions will allow a test framework to replace single functions.
I'm experiencing a strange issue when I try to compile two source files that contain some important computing algorithms that need to be highly optimized for speed.
Initially, I have two source files, let's call them A.c and B.c, each containing multiple functions that call each other (functions from a file may call functions from the other file). I compile both files with full speed optimizations and then when I run the main algorithm in an application, it takes 900 ms to run.
Then I notice the functions from the two files are mixed up from a logical point of view, so I move some functions from A.c to B.c; let's call the new files A2.c and B2.c. I also update the two headers A.h and B.h by moving the corresponding declarations.
Moving function definitions from one file to the other is the only modification I make!
The strange result is that after I compile the two files again with the same optimizations, the algorithm now takes 1000 ms to run.
What is going on here?
What I suspect happens: when functions f calls function g, being in the same file allows the compiler to replace actual function calls with inline code as an optimization. This is no longer possible when definitions are not compiled at the same time.
Am I correct in my assumption?
Aside from regrouping the function definitions as it was before, is there anything I can do to obtain the same optimization as before? I researched and it seems it's not possible to compile two source files simultaneously into a single object file. Could the order of compilation matter?
As to whether your assumption is correct, the best way to tell is to examine the assembler output, such as by using gcc -S or gcc -save-temps. That will be the definitive way to see what your compiler has done.
As to compiling two C source files into a single object file, that's certainly doable. Just create a AB.c as follows:
#include "A.c"
#include "B.c"
and compile that.
Barring things that should be kept separate (such as static items which may exist in both C files), that should work (or at least work with a little modification).
However, remember the optimisation mantra: Measure, don't guess! You're giving up a fair bit of encapsulation by combining them so make sure the benefits well outweigh the costs.
What is the point of having header files in C, if the header file not only includes prototypes of functions but also complete functions? I came across the file kdev_t.h in the linux source, which had the following:
static inline dev_t new_decode_dev(u32 dev)
{
unsigned major = (dev & 0xfff00) >> 8;
unsigned minor = (dev & 0xff) | ((dev >> 12) & 0xfff00);
return MKDEV(major, minor);
}
Why the .h extension? This question refers to classes in C++ but I'm not sure if the same principle applies here.
I don't have any sources to back me up but I think it's a combination of the following reasons:
Optimization. It's small enough that it would make optimization easier if the complete implementation was available to the compiler. Some of these small functions are called very often in the kernel that a cost of a branch can quickly add up. In your example, the code really only does some bit shifting tricks - not worth warranting a whole function call to another object file. With features like LTO though, I'm not so sure this is relevant.
Macros are a pain for anything longer than a one-liner. Sometimes writing a separate function is a little overkill but the task is too long to fit in a (human-readable) macro. Macros also come with their own set of headaches. static inline functions provide essentially the same performance benefits as a macro but also maintains human readability.
Empty implementations. The kernel is very configurable and you see a lot of #ifdefs for options. Sometimes, if a kernel function is disabled, we still want the functions to be callable but just return an error. These empty implementations are best placed in the header files instead of the object files so the kernel doesn't have to run the compiler on every single C file in the source tree. In other words, we improve compile times. Obviously there are also run-time optimizations where the compiler can know in advance on where dead code lie.
I think a combination of the 3 reasons above is the reason why Linux puts some small functions in the header file.
I've been doing some tests with Valgrind to understand how functions are translated by the compiler and have found that, sometimes, functions written on different files perform poorly compared to functions written on the same source file due to not being inlined.
Considering I have different files, each containing functions related to a particular area and all files share a common header declaring all functions, is this expected?
Why doesn't the compiler inline them when they are written on different files but does when they are on the same page?
If this behavior starts to cause performance issues, what is the recommended course of action, put all of the functions on the same file manually before compiling?
example:
//source 1
void foo(char *str1, char *str2)
{
//here goes the code
}
//source 2
void *bar(int something, char *somethingElse)
{
//bar code
foo(variableInsideBar, anotherVariableCreatedInsideBar);
return variableInsideBar;
}
Sample performance cost:
On different files: 29920
Both on the same file: 8704
For bigger functions it is not as pronounced, but still happens.
If you are using gcc you should try the options -combine and -fwhole-program and pass all the source files to the compiler in one invocation. Traditionally different C files are compiled separately, but it is becoming more common to optimize cross compilation units (files).
The compiler proper cannot inline functions defined in different translation units simply because it cannot see the definitions of these functions, i.e. it cannot see the source code for these functions. Historically, C compilers (and the language itself) were built around the principles of independent translation. Each translation unit is compiled from source code into an object code completely independently form other translation units. Only at the very last stage of translation all these disjoint pieces of object code are assembled together into a final program by so called linker. But in a traditional compiler implementation at that point it is already too late to inline anything.
As you probably know, the language-level support for function inlining says that in order for a function to be "inlinable" in some translation unit it has to be defined in that translation unit, i.e. the source code for its body should be visible to the compiler in that translation unit. This requirement stems directly from the aforementioned principle of independent translation.
Many modern compilers are gradually introducing features that overcome the limitations of the classic pure independent translation. They implement features like global optimizations which allow various optimizations that cross the boundaries of translation units. That potentially includes the ability to inline functions defined in other translation units. Consult your compiler documentation in order to see whether it can inline functions across translation units and how to enable this sort of optimizations.
The reason such global optimizations are usually disabled by default is that they can significantly increase the translation time.
Wow, how you noticed that. I think that's because then you compile something, at first compiler turn one c file to one object file without looking at any other files. After it made object file, it doesn't apply any optimisations.
I don't think it costs much perfomance.
I know there are at least three popular methods to call the same function with multiple names. I haven't actually heard of someone using the fourth method for this purpose.
1). Could use #defines:
int my_function (int);
#define my_func my_function
OR
#define my_func(int (a)) my_function(int (a))
2). Embedded function calls are another possibility:
int my_func(int a) {
return my_function(a);
}
3). Use a weak alias in the linker:
int my_func(int a) __attribute__((weak, alias("my_function")));
4). Function pointers:
int (* const my_func)(int) = my_function;
The reason I need multiple names is for a mathematical library that has multiple implementations of the same method.
For example, I need an efficient method to calculate the square root of a scalar floating point number. So I could just use math.h's sqrt(). This is not very efficient. So I write one or two other methods, such as one using Newton's Method. The problem is each technique is better on certain processors (in my case microcontrollers). So I want the compilation process to choose the best method.
I think this means it would be best to use either the macros or the weak alias since those techniques could easily be grouped in a few #ifdef statements in the header files. This simplifies maintenance (relatively). It is also possible to do using the function pointers, but it would have to be in the source file with extern declarations of the general functions in the header file.
Which do you think is the better method?
Edit:
From the proposed solutions, there appears to be two important questions that I did not address.
Q. Are the users working primarily in C/C++?
A. All known development will be in C/C++ or assembly. I am designing this library for my own personal use, mostly for work on bare metal projects. There will be either no or minimal operating system features. There is a remote possibility of using this in full blown operating systems, which would require consideration of language bindings. Since this is for personal growth, it would be advantageous to learn library development on popular embedded operating systems.
Q. Are the users going to need/want an exposed library?
A. So far, yes. Since it is just me, I want to make direct modifications for each processor I use after testing. This is where the test suite would be useful. So an exposed library would help somewhat. Additionally, each "optimal implementation" for particular function may have a failing conditions. At this point, it has to be decided who fixes the problem: the user or the library designer. A user would need an exposed library to work around failing conditions. I am both the "user" and "library designer". It would almost be better to allow for both. Then non-realtime applications could let the library solve all of stability problems as they come up, but real-time applications would be empowered to consider algorithm speed/space vs. algorithm stability.
Another alternative would be to move the functionality into a separately compiled library optimised for each different architecture and then just link to this library during compilation. This would allow the project code to remain unchanged.
Depending on the intended audience for your library, I suggest you chose between 2 alternatives:
If the consumer of your library is guaranteed to be Cish, use #define sqrt newton_sqrt for optimal readability
If some consumers of your library are not of the C variety (think bindings to Dephi, .NET, whatever) try to avoid consumer-visible #defines. This is a major PITA for bindings, as macros are not visible on the binary - embedded function calls are the most binding-friendly.
What you can do is this. In header file (.h):
int function(void);
In the source file (.c):
static int function_implementation_a(void);
static int function_implementation_b(void);
static int function_implementation_c(void);
#if ARCH == ARCH_A
int function(void)
{
return function_implementation_a();
}
#elif ARCH == ARCH_B
int function(void)
{
return function_implementation_b();
}
#else
int function(void)
{
return function_implementation_c();
}
#endif // ARCH
Static functions called once are often inlined by the implementation. This is the case for example with gcc by default : -finline-functions-called-once is enabled even in -O0. The static functions that are not called are also usually not included in the final binary.
Note that I don't put the #if and #else in a single function body because I find the code more readable when #if directives are outside the functions body.
Note this way works better with embedded code where libraries are usually distributed in their source form.
I usually like to solve this with a single declaration in a header file with a different source file for each architecture/processor-type. Then I just have the build system (usually GNU make) choose the right source file.
I usually split the source tree into separate directories for common code and for target-specific code. For instance, my current project has a toplevel directory Project1 and underneath it are include, common, arm, and host directories. For arm and host, the Makefile looks for source in the proper directory based on the target.
I think this makes it easier to navigate the code since I don't have to look up weak symbols or preprocessor definitions to see what functions are actually getting called. It also avoids the ugliness of function wrappers and the potential performance hit of function pointers.
You might you create a test suite for all algorithms and run it on the target to determine which are the best performing, then have the test suite automatically generate the necessary linker aliases (method 3).
Beyond that a simple #define (method 1) probably the simplest, and will not and any potential overhead. It does however expose to the library user that there might be multiple implementations, which may be undesirable.
Personally, since only one implementation of each function is likley to be optimal on any specific target, I'd use the test suite to determine the required versions for each target and build a separate library for each target with only those one version of each function the correct function name directly.