When is "inline" ineffective? (in C) - c

Some people love using inline keyword in C, and put big functions in headers. When do you consider this to be ineffective? I consider it sometime even annoying, because it is unusual.
My principle is that inline should be used for small functions accessed very frequently, or in order to have real type checking. Anyhow, my taste guide me, but I am not sure how to explain best the reasons why inline is not so useful for big functions.
In this question people suggest that the compiler can do a better job at guessing the right thing to do. That was also my assumption. When I try to use this argument, people reply it does not work with functions coming from different objects. Well, I don't know (for example, using GCC).
Thanks for your answers!

inline does two things:
gives you an exemption from the "one definition rule" (see below). This always applies.
Gives the compiler a hint to avoid a function call. The compiler is free to ignore this.
#1 Can be very useful (e.g. put definition in header if short) even if #2 is disabled.
In practice compilers often do a better job of working out what to inline themselves (especially if profile guided optimisation is available).
[EDIT: Full References and relevant text]
The two points above both follow from the ISO/ANSI standard (ISO/IEC 9899:1999(E), commonly known as "C99").
In §6.9 "External Definition", paragraph 5:
An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier declared with external linkage is used in an expression (other than as part of the operand of a sizeof operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the identifier; otherwise, there shall be no more than one.
While the equalivalent definition in C++ is explictly named the One Definition Rule (ODR) it serves the same purpose. Externals (i.e. not "static", and thus local to a single Translation Unit -- typically a single source file) can only be defined once only unless it is a function and inline.
In §6.7.4, "Function Specifiers", the inline keyword is defined:
Making a function an inline function suggests that calls to the function be as
fast as possible.[118] The extent to which such suggestions are effective is
implementation-defined.
And footnote (non-normative), but provides clarification:
By using, for example, an alternative to the usual function call mechanism, such as ‘‘inline substitution’’. Inline substitution is not textual substitution, nor does it create a new function. Therefore, for example, the expansion of a macro used within the body of the function uses the definition it had at the point the function body appears, and not where the function is called; and identifiers refer to the declarations in scope where the body occurs. Likewise, the function has a single address, regardless of the number of inline definitions that occur in addition to the external definition.
Summary: what most users of C and C++ expect from inline is not what they get. Its apparent primary purpose, to avoid functional call overhead, is completely optional. But to allow separate compilation, a relaxation of single definition is required.
(All emphasis in the quotes from the standard.)
EDIT 2: A few notes:
There are various restrictions on external inline functions. You cannot have a static variable in the function, and you cannot reference static TU scope objects/functions.
Just seen this on VC++'s "whole program optimisation", which is an example of a compiler doing its own inline thing, rather than the author.

The important thing about an inline declaration is that it doesn't necessarily do anything. A compiler is free to decide to, in many cases, to inline a function not declared so, and to link functions which are declared inline.

Another reason why you shouldn't use inline for large functions, is in the case of libraries. Every time you change the inline functions, you might loose ABI compatibility because the application compiled against an older header, has still inlined the old version of the function. If inline functions are used as a typesafe macro, chances are great that the function never needs to be changed in the life cycle of the library. But for big functions this is hard to guarantee.
Of course, this argument only applies if the function is part of your public API.

An example to illustrate the benefits of inline. sinCos.h :
int16 sinLUT[ TWO_PI ];
static inline int16_t cos_LUT( int16_t x ) {
return sin_LUT( x + PI_OVER_TWO )
}
static inline int16_t sin_LUT( int16_t x ) {
return sinLUT[(uint16_t)x];
}
When doing some heavy number crunching and you want to avoid wasting cycles on computing sin/cos you replace sin/cos with a LUT.
When you compile without inline the compiler will not optimize the loop and the output .asm will show something along the lines of :
;*----------------------------------------------------------------------------*
;* SOFTWARE PIPELINE INFORMATION
;* Disqualified loop: Loop contains a call
;*----------------------------------------------------------------------------*
When you compile with inline the compiler has knowledge about what happens in the loop and will optimize because it knows exactly what is happening.
The output .asm will have an optimized "pipelined" loop ( i.e. it will try to fully utilize all the processor's ALUs and try to keep the processor's pipeline full without NOPS).
In this specific case, I was able to increase my performance by about 2X or 4X which got me within what I needed for my real time deadline.
p.s. I was working on a fixed point processor... and any floating point operations like sin/cos killed my performance.

Inline is ineffective when you use the pointer to function.

Inline is effective in one case: when you've got a performance problem, ran your profiler with real data, and found the function call overhead for some small functions to be significant.
Outside of that, I can't imagine why you'd use it.

That's right. Using inline for big functions increases compile time, and brings little extra performance to the application. Inline functions are used to tell the compiler that a function is to be included without a call, and such should be small code repeated many times. In other words: for big functions, the cost of making the call compared to the cost of the own function implementation is negligible.

I mainly use inline functions as typesafe macros. There's been talk about adding support for link-time optimizations to GCC for quite some time, especially since LLVM came along. I don't know how much of it actually has been implemented yet, though.

Personally I don't think you should ever inline, unless you have first run a profiler on your code and have proven that there is a significant bottleneck on that routine that can be partially alleviated by inlining.
This is yet another case of the Premature Optimization Knuth warned about.

Inline can be used for small and frequently used functions such as getter or setter method. For big functions it is not advisable to use inline as it increases the exe size.
Also for recursive functions, even if you make inline, the compiler will ignore it.

inline acts as a hint only.
Added only very recently. So works with only the latest standard compliant compilers.

Inline functions should be approximately 10 lines or less, give or take, depending on your compiler of choice.
You can tell your compiler that you want something inlined .. its up to the compiler to do so. There is no -force-inline option that I know of which the compiler can't ignore. That is why you should look at the assembler output and see if your compiler actually did inline the function, if not, why not? Many compilers just silently say 'screw you!' in that respect.
so if:
static inline unsigned int foo(const char *bar)
.. does not improve things over static int foo() its time to revisit your optimizations (and likely loops) or argue with your compiler. Take special care to argue with your compiler first, not the people who develop it.. or your just in store for lots of unpleasant reading when you open your inbox the next day.
Meanwhile, when making something (or attempting to make something) inline, does doing so actually justify the bloat? Do you really want that function expanded every time its called? Is the jump so costly?, your compiler is usually correct 9/10 times, check the intermediate output (or asm dumps).

Related

Can a function know what's calling it?

Can a function tell what's calling it, through the use of memory addresses maybe? For example, function foo(); gets data on whether it is being called in main(); rather than some other function?
If so, is it possible to change the content of foo(); based on what is calling it?
Example:
int foo()
{
if (being called from main())
printf("Hello\n");
if (being called from some other function)
printf("Goodbye\n");
}
This question might be kind of out there, but is there some sort of C trickery that can make this possible?
For highly optimized C it doesn't really make sense. The harder the compiler tries to optimize the less the final executable resembles the source code (especially for link-time code generation where the old "separate compilation units" problem no longer prevents lots of optimizations). At least in theory (but often in practice for some compilers) functions that existed in the source code may not exist in the final executable (e.g. may have been inlined into their caller); functions that didn't exist in the source code may be generated (e.g. compiler detects common sequences in many functions and "out-lines" them into a new function to avoid code duplication); and functions may be replaced by data (e.g. an "int abcd(uint8_t a, uint8_t b)" replaced by a abcd_table[a][b] lookup table).
For strict C (no extensions or hacks), no. It simply can't support anything like this because it can't expect that (for any compiler including future compilers that don't exist yet) the final output/executable resembles the source code.
An implementation defined extension, or even just a hack involving inline assembly, may be "technically possible" (especially if the compiler doesn't optimize the code well). The most likely approach would be to (ab)use debugging information to determine the caller from "what the function should return to when it returns".
A better way for a compiler to support a hypothetical extension like this may be for the compiler to use some of the optimizations I mentioned - specifically, split the original foo() into 2 separate versions where one version is only ever called from main() and the other version is used for other callers. This has the bonus of letting the compiler optimize out the branches too - it could become like int foo_when_called_from_main() { printf("Hello\n"); }, which could be inlined directly into the caller, so that neither version of foo exists in the final executable. Of course if foo() had other code that's used by all callers then that common code could be lifted out into a new function rather than duplicating it (e.g. so it might become like int foo_when_called_from_main() { printf("Hello\n"); foo_common_code(); }).
There probably isn't any hypothetical compiler that works like that, but there's no real reason you can't do these same optimizations yourself (and have it work on all compilers).
Note: Yes, this was just a crafty way of suggesting that you can/should refactor the code so that it doesn't need to know which function is calling it.
Knowing who called a specific function is essentially what a stack trace is visualizing. There are no general standard way of extracting that though. In theory one could write code that targeted each system type the software would run on, and implement a stack trace function for each of them. In that case you could examine the stack and see what is before the current function.
But with all that said and done, the question you should probably ask is why? Writing a function that functions in a specific way when called from a specific function is not well isolated logic. Instead you could consider passing in a parameter to the function that caused the change in logic. That would also make the result more testable and reliable.
How to actually extract a stack trace has already received many answers here: How can one grab a stack trace in C?
I think if loop in C cannot have a condition as you have mentioned.
If you want to check whether this function is called from main(), you have to do the printf statement in the main() and also at the other function.
I don't really know what you are trying to achieve but according to what I understood, what you can do is each function will pass an additional argument that would uniquely identify that function in form of a character array, integer or enumeration.
for example:
enum function{main, add, sub, div, mul};
and call functions like:
add(3,5,main);//adds 3 and 5. called from main
changes to the code would be typical like if you are adding more functions. but it's an easier way to do it.
No. The C language does not support obtaining the name or other information of who called a function.
As all other answers show, this can only be obtained using external tools, for example that use stack traces and compiler/linker emitted symbol tables.

When to use static inline instead of regular functions

When I inspect other people's codes, I sometimes encounter static inline functions implemented in header files as opposed to regular function implementations in C files.
For example, cache.h header file (https://github.com/git/git/blob/master/cache.h) of git contains many such functions. One of them is copied below;
static inline void copy_cache_entry(struct cache_entry *dst,
const struct cache_entry *src)
{
unsigned int state = dst->ce_flags & CE_HASHED;
/* Don't copy hash chain and name */
memcpy(&dst->ce_stat_data, &src->ce_stat_data,
offsetof(struct cache_entry, name) -
offsetof(struct cache_entry, ce_stat_data));
/* Restore the hash state */
dst->ce_flags = (dst->ce_flags & ~CE_HASHED) | state;
}
I was wondering what are the advantages of using static inline functions compared to regular functions. Is there any guideline one can use to choose which style to adapt?
Inlining is done for optimization. However, a little known fact is that inline can also hurt performance: Your CPU has an instruction cache with a fixed size, and inlining has the downside of replicating the function at several places, which makes the instruction cache less efficient.
So, from a performance point of view, it's generally not advisable to declare functions inline unless they are so short that their call is more expensive than their execution.
To put this in relation: a function call takes somewhere between 10 to 30 cycles of CPU time (depending on the amount of arguments). Arithmetic operations generally take a single cycle, however, memory loads from first level cache takes something like three to four cycles. So, if your function is more complex than a simple sequence of at most three memory accesses and some arithmetic, there is little point in inlining it.
I usually take this approach:
If a function is as simple as incrementing a single counter, and if it is used all over the place, I inline it. Examples of this are rare, but one valid case is reference counting.
If a function is used only within a single file, I declare it as static, not inline. This has the effect that the compiler can see when such a function is used precisely one time. And if it sees that, it will very likely inline it, no matter how complex it is, since it can prove that there is no downside of inlining.
All other functions are neither static nor inline.
The example in your question is a borderline example: It contains a function call, thus it seems to be too complex for inlining at first sight.
However, the memcpy() function is special: it is seen more as a part of the language than as a library function. Most compilers will inline it, and optimize it heavily when the size is a small compile time constant, which is the case in the code in question.
With that optimization, the function is indeed reduced to a short, simple sequence. I cannot say whether it touches a lot of memory because I don't know the structure that is copied. If that structure is small, adding the inline keyword seems to be a good idea in this case.
inline allows you to define functions in a header.
static makes the function available only in the current translation unit.
The main reason is performance: when used appropriately, inline functions could enable the compiler to generate more efficient code.
A good strategy for identifying performance bottlenecks is to profile the code. Once that's done, the most effective way to improve performance is by focusing on the bottlenecks. There are many strategies, such as algorithmic improvements, etc. One such strategy is make short frequently-used function inline.
Like with any other attempts to improve performance, the result needs to be tested to ensure that the change was actually beneficial.
The potential advantage of inlining is, that a function call can be avoided, which may save some execution time and stack memory, sacrificing some space for the executable (if the function is used more than once). It may also allow further optimizations by eliminating dead code (e.g. a function returning an error code for an invalid argument, calling a function doing the same checks, where the second identical check can be removed when inlined). Note, that inlining as an optimization technique and an inline definition (as defined by the C standard) are two different thins: A compiler may inline every function where it sees a definition, and may decide to perform an actual function call for an inline function.
Every function declared static in a strictly conforming program can be declared inline. This is only a hint for the compiler and doesn't have any semantic meaning (nb, for functions with external linkage, there is a difference).
Sometimes, static inline is seen as a type-checking alternative for a macro function and thus could be seen as serving some documentation purposes.
It's important to document a function as being static and defined in the header (or at least as being potentially static and defined in the header), as the user of such a header must not assume that taking the address of the function in different translation units yields equal results.
If a definition should be in a header (to allow inlining), I personally prefer inline functions with external linkage, as addresses compare equal and the compiler still can inline if it thinks it's worth it.
static inline functions are especially safe to use in microcontrollers. These devices often do not have an instruction cache (check datasheet) and thus inlining is always saving time.

What is better: function or define

I have couple of simple functions like
#define JacobiLog(x1,x2) ((x1>x2)?x1:x2)+log(1+exp(-fabs(x1-x2)))
What is better to implement (code, compile, memory...) - as above with define or to write some simple function
double JacobiLog(double x1,double x2)
{
return ((x1>x2) ? x1 : x2) + log(1+exp(-fabs(x1-x2)));
}
The compiler will probably automatically set your function as inline. You should use it and not a define.
It will also avoid unexpected comportment in the case where you use your define as
double num = JacobiLog(x++, y++);
I let you imagine the problem with code replacement...
define can possibly be little faster, but most probably compiler will inline the function anyway (or you can mark as inline) and they will be the same. But function is better, because it is more readable and easier to debug.
The function is better, assuming a good compiler.
With the function, it is left to the compiler whether the code is inlined, or not (assuming the definition of the function is accessible to everyone who uses it, for example if it is an inline function declared in a header for C++, or just a plain function with all of its users in the same translation unit). With the macro, it is always inlined, which is not necessarily faster, as it may lead to code bloat and therefore more cache misses and page faults.
Not to mention macros are difficult to read and, even worse, to debug.
Even though the 'define' is faster (since it prevents a function call), the compiler can optimize and inline your function, and make it as fast.
If you are in a c++ environment, you should always use template and functions. It will make you're program more readable and prevent type error.
In C, macro can be useful since the type is not specified (see example below):
/* Will work with int, long, double, short, etc. */
#HIGHER(VAL1, VAL2) ((VAL1) > (VAL2) ? (VAL1) : (VAL2))
It's a micro-optimization. Unless you're doing embedded programming and every instruction counts, go with the function. Not to mention that the log is likely about 100x slower than the overhead to call a function. So you can only get about a 1% saving if your program consists mainly of calling this function. [1] Once your program starts doing significant other things, this saving will be reduced to basically unnoticeable.
The compiler is free to inline the function wherever possible, which would make the two identical. However, you can't force the compiler to do so. There is an inline keyword in C++, but this is just a hint, the compiler is free to ignore it.
See this for some differences between the two (this covers inline versus non-inline functions, but, as stated above, inline functions are essentially the same as #define's). The basic conclusion to the link is "it depends".
Also note that, behaviourally, a #define and a function are not 100% equivalent.
[1]: Figures largely made up. Benchmark if you want accurate results.
First (for a complete answer) we have to acknowledge that using a macro can have surprise side-effects which you might not intend, and that a function ensures that you know the incoming types and you know that each parameter is evaluated exactly once.
More often than not, these effects of using a macro are a source of problems.
Generally a compiler will inline the function as appropriate, and if it does its job right then it should have nearly all the advantages of a macro but without the rarely-intended side-effects.
Occasionally, though, you can actually get some benefits that an inlining compiler mightn't recognise. For example your macro will temporarily defer converting the arguments to double if they were int or long and perform more operations in integer arithmetic (which might have a performance or precision advantage). You might also get integer overflow and incorrect results.
Since you included 'memory' in your list of "better" factors, it's tempting to say that the function is smaller (assuming you configure your compiler to optimise for size), but this isn't necessarily true.
Obviously as a function you need only one copy of it in memory and all callers can use that same code, whereas inlined or expanded at every use duplicates the code. Your compiler is very unlikely to isolate a macro and convert it into a function called from many different places in the code.
Where a never-inlined function can fail to be smaller is where it stands in the way of simplifications. There are three common cases I can think of:
If all of the uses of the function involve constant parameters, the inlined simplifications might come out smaller than the whole original function.
The register marshalling code required to execute a function call with the parameters in the correct registers can be longer than the function itself.
Adding a function call can add to the register pressure in the caller, forcing it to generate more complicated code, possibly forcing it to create a stack frame and save more registers on entry and exit.

C - Parameterized Macros

I can' figure out what is the advantage of using
#define CRANDOM() (random() / 2.33);
instead of
float CRANDOM() {
return random() / 2.33;
}
By using a #define macro you are forcing the body of the macro to be inserted inline.
When using a function there will1 be a function call (and therefor a jump to the address of the function (among other things)), which will slow down performance somewhat.
The former will most often be faster, even though the size of the executable will grow for each use of the #defined macro.
greenend.org.uk - Inline Functions In C
1 a compiler might be smart enough to optimize away the function call, and inline the function - effectively making it the same as using a macro. But for the sake of simplicitly we will disregard this in this post.
It makes sure that the call to CRANDOM is inlined, even if the compiler doesn't support inlining.
Firstly, that #define is incorrect due to the semi-colon at the end, and the compiler would baulk at:
float f = CRANDOM() * 2;
Secondly, I personally try and avoid using the preprocessor beyond separating platform-independent sections in cross-platform code, and of course code reserved exclusively for DEBUG or non-DEBUG builds.
nightcracker correctly states it will always be "effectively" inline, but given you can re-write the function to be inline itself, I see no advantage to using the preprocessor version unless the C-compiler in question does not inline.
The former is old style. The only advantage of the former is that if you have a compiler following the old C90 standard, the macro will work as inlining. On a modern C compiler you should always write:
inline float CRANDOM() {
return random() / 2.33f;
}
where the inline keyword is optional.
(Note that float literals must have a f at the end, otherwise you force the calculation to be performed on double, which you then implicitly round into a float.)
Calling a function involves a little bit of overhead -- pushing the return address onto the machine stack and branching, in this case. By using a macro, you can avoid this overhead. A long time ago, this was important; these days, many compilers will insert the body of a tiny function like this inline, anyway. In general, trying to fool the compiler into emitting faster code is a fool's game; you often end up creating something slower.

When to use function-like macros in C

I was reading some code written in C this evening, and at the top of
the file was the function-like macro HASH:
#define HASH(fp) (((unsigned long)fp)%NHASH)
This left me wondering, why would somebody choose to implement a
function this way using a function-like macro instead of implementing
it as a regular vanilla C function? What are the advantages and
disadvantages of each implementation?
Thanks a bunch!
Macros like that avoid the overhead of a function call.
It might not seem like much. But in your example, the macro turns into 1-2 machine language instructions, depending on your CPU:
Get the value of fp out of memory and put it in a register
Take the value in the register, do a modulus (%) calculation by a fixed value, and leave that in the same register
whereas the function equivalent would be a lot more machine language instructions, generally something like
Stick the value of fp on the stack
Call the function, which also puts the next (return) address on the stack
Maybe build a stack frame inside the function, depending on the CPU architecture and ABI convention
Get the value of fp off the stack and put it in a register
Take the value in the register, do a modulus (%) calculation by a fixed value, and leave that in the same register
Maybe take the value from the register and put it back on the stack, depending on CPU and ABI
If a stack frame was built, unwind it
Pop the return address off the stack and resume executing instructions there
A lot more code, eh? If you're doing something like rendering every one of the tens of thousands of pixels in a window in a GUI, things run an awful lot faster if you use the macro.
Personally, I prefer using C++ inline as being more readable and less error-prone, but inlines are also really more of a hint to the compiler which it doesn't have to take. Preprocessor macros are a sledge hammer the compiler can't argue with.
One important advantage of macro-based implementation is that it is not tied to any concrete parameter type. A function-like macro in C acts, in many respects, as a template function in C++ (templates in C++ were born as "more civilized" macros, BTW). In this particular case the argument of the macro has no concrete type. It might be absolutely anything that is convertible to type unsigned long. For example, if the user so pleases (and if they are willing to accept the implementation-defined consequences), they can pass pointer types to this macro.
Anyway, I have to admit that this macro is not the best example of type-independent flexibility of macros, but in general that flexibility comes handy quite often. Again, when certain functionality is implemented by a function, it is restricted to specific parameter types. In many cases in order to apply similar operation to different types it is necessary to provide several functions with different types of parameters (and different names, since this is C), while the same can be done by just one function-like macro. For example, macro
#define ABS(x) ((x) >= 0 ? (x) : -(x))
works with all arithmetic types, while function-based implementation has to provide quite a few of them (I'm implying the standard abs, labs, llabs and fabs). (And yes, I'm aware of the traditionally mentioned dangers of such macro.)
Macros are not perfect, but the popular maxim about "function-like macros being no longer necessary because of inline functions" is just plain nonsense. In order to fully replace function-like macros C is going to need function templates (as in C++) or at least function overloading (as in C++ again). Without that function-like macros are and will remain extremely useful mainstream tool in C.
On one hand, macros are bad because they're done by the preprocessor, which doesn't understand anything about the language and does text-replace. They usually have plenty of limitations. I can't see one above, but usually macros are ugly solutions.
On the other hand, they are at times even faster than a static inline method. I was heavily optimizing a short program and found that calling a static inline method takes about twice as much time (just overhead, not actual function body) as compared with a macro.
The most common (and most often wrong) reason people give for using macros (in "plain old C") is the efficiency argument. Using them for efficiency is fine if you have actually profiled your code and are optimizing a true bottleneck (or are writing a library function that might be a bottleneck for somebody someday). But most people who insist on using them have Not actually analyzed anything and are just creating confusion where it adds no benefit.
Macros can also be used for some handy search-and-replace type substitutions which the regular C language is not capable of.
Some problems I have had in maintaining code written by macro abusers is that the macros can look quite like functions but do not show up in the symbol table, so it can be very annoying trying to trace them back to their origins in sprawling codesets (where is this thing defined?!). Writing macros in ALL CAPS is obviously helpful to future readers.
If they are more than fairly simple substitutions, they can also create some confusion if you have to step-trace through them with a debugger.
Your example is not really a function at all,
#define HASH(fp) (((unsigned long)fp)%NHASH)
// this is a cast ^^^^^^^^^^^^^^^
// this is your value 'fp' ^^
// this is a MOD operation ^^^^^^
I'd think, this was just a way of writing more readable code with the casting and mod opration wrapped into a single macro 'HASH(fp)'
Now, if you decide to write a function for this, it would probably look like,
int hashThis(int fp)
{
return ((fp)%NHASH);
}
Quite an overkill for a function as it,
introduces a call point
introduces call-stack setup and restore
The C Preprocessor can be used to create inline functions. In your example, the code will appear to call the function HASH, but instead is just inline code.
The benefits of doing macro functions were eliminated when C++ introduced inline functions. Many older API like MFC and ATL still use macro functions to do preprocessor tricks, but it just leaves the code convoluted and harder to read.

Resources