I've lately encountered a lot of functions where gcc generates really bad code on x86. They all fit a pattern of:
if (some_condition) {
/* do something really simple and return */
} else {
/* something complex that needs lots of registers */
}
Think of simple case as something so small that half or more of the work is spent pushing and popping registers that won't be modified at all. If I were writing the asm by hand, I would save and restore the saved-across-calls registers inside the complex case, and avoid touching the stack pointer at all in the simple case.
Is there any way to get gcc to be a little bit smarter and do this itself? Preferably with command line options rather than ugly hacks in the source...
Edit: To make it concrete, here's something very close to some of the functions I'm dealing with:
if (buf->pos < buf->end) {
return *buf->pos++;
} else {
/* fill buffer */
}
and another one:
if (!initialized) {
/* complex initialization procedure */
}
return &initialized_object;
and another:
if (mutex->type == SIMPLE) {
return atomic_swap(&mutex->lock, 1);
} else {
/* deal with ownership, etc. */
}
Edit 2: I should have mentioned to begin with: these functions cannot be inlined. They have external linkage and they're library code. Allowing them to be inlined in the application would result in all kinds of problems.
Update
To explicitely suppress inlining for a single function in gcc, use:
void foo() __attribute__ ((noinline))
{
...
}
See also How can I tell gcc not to inline a function?
Functions like this will regularly be inlined automatically unless compiled -O0 (disable optimization).
In C++ you can hint the compiler using the inline keyword
If the compiler won't take your hint you are probably using too many registers/branches inside the function. The situation is almost certainly resolved by extracting the 'complicated' block into it's own function.
Update i noticed you added the fact that they are extern symbols. (Please update the question with that crucial info). Well, in a sense, with external functions, all bets are off. I cannot really believe that gcc will by definition inline all of a complex function into a tiny caller simply because it is only called from there. Perhaps you can give some sample code that demonstrates the behaviour and we can find the proper optimization flags to remedy that?
Also, is this C or C++? In C++ I know it is common place to include the trivial decision functions inline (mostly as members defined in the class declaration). This won't give a linkage conflict like with simple (extern) C functions.
Also you can have template functions defined that will inline perfectly in all compilation modules without resulting in link conflicts.
I hope you are using C++ because it will give you a ton of options here.
I would do it like this:
static void complex_function() {}
void foo()
{
if(simple_case) {
// do whatever
return;
} else {
complex_function();
}
}
The compiler my insist on inlining complex_function(), in which case you can use the noinline attribute on it.
Perhaps upgrade your version of gcc? 4.6 has just been released. As far as I understand, it has the possibility of "partial inline". That is, an easily integratable outer part of a function is inlined and the expensive part is transformed into a call. But I have to admit that I didn't try it myself, yet.
Edit: The statement I was referring to from the ChangeLog:
Partial inlining is now supported and
enabled by default at -O2 and greater.
The feature can be controlled via
-fpartial-inlining.
Partial inlining splits functions with
short hot path to return. This allows
more aggressive inlining of the hot
path leading to better performance and
often to code size reductions (because
cold parts of functions are not
duplicated).
...
Inlining when optimizing for size
(either in cold regions of a program
or when compiling with -Os) was
improved to better handle C++ programs
with larger abstraction penalty,
leading to smaller and faster code.
I would probably refactor the code to encourage inlining of the simple case. That said, you can use -finline-limit to make gcc consider inlining larger functions, or -fomit-frame-pointer -fno-exceptions to minimize the stack frame. (Note that the latter may break debugging and cause C++ exceptions to misbehave badly.)
Probably you won't be able to get much from tweaking compiler options, though, and will have to refactor.
Seeing as these are external calls, it might be possible the gcc is treating them as unsafe and preserving registers for the function call(hard to know without seeing the registers that it preserves, including the ones you say 'aren't used'). Out of curiousity, does this excessive register spilling still occur with all optimizations disabled?
Related
Im reading What Every Programmer Should Know About Memory
https://people.freebsd.org/~lstewart/articles/cpumemory.pdf and it says that inline functions make your code more optimizable
for example:
Inlining of functions, in particular, allows the compiler to optimize larger chunks of code at a time which, in turn, enables the generation of machine code which better exploits the processor’s pipeline architecture
and:
The handling of both code and data (through dead code elimination or value range propagation, and others) works better when larger parts of the program can be considered as a single unit.
and this also:
If a function is only called once it might as well be inlined. This gives the compiler the opportunity to perform more optimizations (like value range propagation, which might significantly improve the code).
After reading these, to me atleast it seems like inline functions are easier to optimize, but why? Why is it easier to optimize something is inline?
The reason that it is easier to make a better job when optimizing inlined functions than with outlined is that you know the complete context in which the function is called and can use this information to tune the generated code to match this exact context. This often allows more efficient code for the inlined copy of the function but also for the calling function. The caller and the callee can be optimized to fit each other in a way that is not possible for outlined functions.
There is no difference!
All functions are subject to being inlined by gcc in -O3 optimization mode, whether declared inline, static, neither or both.
see: https://stackoverflow.com/a/40783656/9925764
or here is the modifying the example of #Eugene Sh. without noinline option.
https://godbolt.org/z/arPEf7rd4
Recently I got to view an embedded code in that they are using
#define print() printf("hello world")
instead of
void print() { printf("hello world"); }
My question what is the gain on using #define instead of creating a function?
It may be related to performance.
A function call has some overhead (i.e. calling, saving things on the stack, returning, etc) while a macro is a direct substitution of the macro name with it's contents (i.e. no overhead).
In this example the functions foo and bar does exactly the same. foo uses a macro while bar uses a function call.
As you can see bar and printY together requires more instructions than foo
.
So by using a macro the performance got a little better.
But... there are downsides to this approach:
Macros are hard to debug as you can't single step a macro
Extensive use of a macro increases the size of the binary (compared to using function call). Something that can impact performance in a negative direction.
Also notice that modern compilers (with optimization on) are really good at figuring out when it's a good idea to automatically inline a function (i.e. your code is written with a function call but the compiler decides to inline the function as if it was a macro). So you might get the same performance using function call.
Further, you can use the inline key word as a hint to the compiler that you think it will be good to inline a function. But even with that keyword the compiler may decide not to inline. The only way to make sure that the code gets inline, is by using a macro.
There is no advantage. Using #define like this is quite ancient C programming style.
In the year 1999, the C language got the inline keyword to make all such macros obsolete. And with modern compilers, inline is often superfluous too, since the compiler is nowadays better than the programmer when it comes to determining when to inline.
Some of the embedded compilers out can still be rather bad at such optimizations though, and that's why embedded C code tends to lag behind in modernization.
In general, doing micro-optimizations like this is called "pre-mature optimizations", meaning the programmer is meddling with optimizations that they should leave to the compiler. Even in hard real time systems. Optimizations should only be the last resort when you have 1) detected an actual bottleneck, and 2) disassembled to see if manual inlining actually does anything good for performance.
Sometimes you want to stub out functionality at compile time. Macros give you an easy way to do this.
In JavaScript, there are, often, huge performance penalties for writing functions. For example, if you use this function:
function double(x){ return x*2; }
inside an inner loop, you are probably hitting your performance considerably, so it is really profitable to inline that kind of function for intensive applications. Does this, in general, hold for C? Am I free to create those kind of functions for everything, and rest assured the compiler will do the job, or is hand inlining still important?
The answer is: it depends.
I'm currently using MSVC compiler and GCC for a project at work and my experience is that they both do a pretty good job. Furthermore, the cost of a function call in native code can be pretty small, especially in functions that do not need to be accessible outside the executable (like functions not exported in a shared library). For these functions, there is more flexibility with how the call is actually implemented.
A few things to note: it's much easier for a compiler to optimize calls to static functions. Functions with external linkage often require link time optimization since one must know how and where the function is actually called, as well as the implementation, to do much optimization or inlining. This requires examining more than one compilation unit at a time.
I would say that you should use functions where it makes sense and makes the code easier to read and maintain. In general, it is safe to assume that the cost is smaller than it would be in JavaScript. But in the end, you'd have to profile the code to say anything more precise.
UPDATE: I want to emphasize that functions can be inlined across compilation units, but this requires link-time optimization (or whole program optimization). This is supported in both GCC (https://gcc.gnu.org/wiki/LinkTimeOptimization) and MSVC (http://msdn.microsoft.com/en-us/library/0zza0de8.aspx).
These days, if you can beat the compiler by copying the body of a function and pasting it everywhere you call that function, you probably need a different compiler.
In general, with optimizations turned on, gcc will tend to inline short functions provided that they are defined in the same compilation unit that they are called in.
Moreover, if the calling function and called function are in different compilation units, the compiler does not have a chance to inline them regardless of what you request.
So, if you want to maximize the chance of the compiler optimizing away a function call (without manually inlining), you should define the function call in .h file or in the same c file that it is called in.
There are no inner functions in C. Dot. So the rest of your question is kind of irrelevant.
Anyway, as of "normal" functions in C compiler may or may not inline them ( replace function invocation by its body ). If you compile your code with "optimize for size" it may decide to do not do inlining for obvious reason.
I currently have inline functions calling another inline function (a simple 4 lines big getAbs() function). However, I discovered by looking to the assembler code that the "big" inline functions are well inlined, but the compiler use a bl jump to call the getAbs() function.
Is it not possible to inline a function in another inline function? By the way, this is embedded code, we are not using the standard libraries.
Edit : The compiler is WindRiver, and I already checked that inlining would be beneficial (4 instructions instead of +-40).
Depending on what compiler you are using you may be able to encourage the compiler to be less reluctant to inline, e.g. with gcc you can use __attribute__ ((always_inline)), with Intel ICC you can use icc -inline-level=1 -inline-forceinline, and with Apple's gcc you can use gcc -obey-inline.
The inline keyword is a suggestion to the compiler, nothing more. It's free to take that suggestion on board, totally ignore it or even lie to you and tell that it's doing it while it's really not.
The only way to force code to be inline is to, well, write it inline. But, even, then the compiler may decide it knows better and decide to shift it out to another function. It has a lot of leeway in generating executable code for your particular source, provided it doesn't change the semantics of it.
Modern compilers are more than capable of generating better code than most developers would hand-craft in assembly. I think the inline keyword should go the same path as the register keyword.
If you've seen the output of gcc at its insane optimisation level, you'll understand why. It has produced code that I wouldn't have dreamed possible, and that took me a long time to understand.
As an aside, check this out for what optimisations that gcc actually has, including a great many containing the text "inline" or "inlining".
#gramm: There's quite a few scenarios in which inline isn't necessarily to your benefit. Most compilers use some very advanced heuristics to determine when to inline. When discussing inlining, the simplest idea is, trust your compiler to produce the fastest code.
I have recently had a very similar problem, reading this post has given me a wackky idea. Why not Have a simple pre-compilation (a simple reg ex should do the job ) code parser that parses out the function call to actually put the source code in-line. use a tag such as /inline/ /end_of_inline/ so that you can use normal ide features (if you are or might use an ide.
Include this in your build process, that way you have the readability advantage as well as removing the compilers assumption that you are only as good a developer as most and do not understand when to in-line.
Nonetheless before trying this you should probably go through the compilers command line options.
I would suggest that if your getAbs() function (sounds like absolute value but you really should be showing us code with the question...) is 4 lines long, then you have much bigger optimizations to worry about than whether the code gets inlined or not.
Every time I read about the "inline" declaration in C it is mentioned that it is only a hint to the compiler (i.e. it does not have to obey it). Is there any benefit to adding it then, or should I just rely on the compiler knowing better than me?
There are two reasons to use the inline keyword. One is an optimization hint, and you can safely ignore it; your compiler is like to ignore it too. The other reason is to allow a function to exist in multiple translation units, and that usage is strictly necessary. If you put a function into a .h header file for example, you'd better declare it inline.
Compilers are smart, but can still benefit from hints from the developer. If you think some of your functions in particular should be inlined, then declare them as such. It certainly doesn't hurt.
Generally modern compilers will "inline" things they deem important. I'd let it handle it for you.
Edit:
After reading what others have written, you know what? I think I'd let it handle most of the inlining THEN profile your code and THEN inline functions which are bottlenecks. My advise is slightly colored by a certain developer I work alongside who pre-optimizes all his code. Half the time I need to spend 5 min. just figuring out what is trying to be accomplished.
Like everyone else has said, the keyword is only a hint, but it's a hint most compilers take pretty seriously. Also, most compilers are very bad at inlining functions from different compilation units -- if you define Foo() in file a.c, but call it in b.c, odds are pretty slim that it will intelligently inline b.c's calls to Foo(). (in fact, it won't happen at all without LTCG.) so it's worth using when you're sure a function really needs to be inlined. That's a decision best made with empirical timings.
For example, on my platform I measured the difference between an inline and a direct (nonvirtual) function as about 5 nanoseconds, so my rule of thumb is that I inline functions that should take less than ten cycles: trivial accessors and the like. Other specific functions get inlined afterwards when my profiler tells me that I'm wasting a lot of time in the overhead of calling them.
The C++ FAQ has good info on this. I prefer to use the inline function as it gives the compiler more information about what I would "like" it to do. Whether or not the compiler ends up inlining it is up to it, but giving it a little help won't hurt.
It provides a simple mechanism for the compiler to apply more OPTIMIZATIONS.
Inline functions are faster because you don't need to push and pop things on/off the stack like parameters and the return address; however, it does make your binary slightly larger.
Does it make a significant difference? Not noticeably enough on modern hardware for most. But it can make a difference, which is enough for some people.
Marking something inline does not give you a guarantee that it will be inline. It's just a suggestion to the compiler. Sometimes it's not possible such as when you have a virtual function, or when there is recursion involved. And sometimes the compiler just chooses not to use it.
I could see a situation like this making a detectable difference:
inline int aplusb_pow2(int a, int b) {
return (a + b)*(a + b) ;
}
for(int a = 0; a < 900000; ++a)
for(int b = 0; b < 900000; ++b)
aplusb_pow2(a, b);
You can't be absolutely sure the compiler will catch the "critical" sections of your code: use "inline" when you know it matters.
A compiler isn't a profiler.
The difference isn't likely to matter.
Leave inline out until you measure the code performance and determine that you could gain some performance by inlining a specific function that the compiler chose to treat as normal. Even then, there's no guarantee the compiler will inline that function, but at least you did all you could :)
Since it's only a hint to the compiler, the compiler is free to ignore it, and likely will. The compiler has a lot of relevant information you don't have, such as how much of a cache line a loop will take up, and can inline or not on a case-by-case basis.
It's just a hint, so using it is unlikely to hurt anything. You almost certainly should avoid any compiler-specific things that force functions to be inlined or not inlined.
The declaration is pretty much useless and quite different from the original intent. Compilers have taken much more liberty wrt what and what not to inline (IMO).
It's a hint to the compiler, using it will help you and have the intended significance only sometimes.
If you need to write performance critical programs, do not rely on the compiler (knowledge of performance & optimization is not learned in a day). There's usually a way to override the compiler's judgement (not just hint at your preference), to force inlining. This is the way I declare a function/method inline, more than 95% of the time (knowing also when it is implicit). If/When you know you'd need to know how to inline properly, then employ force-inlining as well, but do learn when and how to use it.
Inlining is not a silver bullet to better performance; it can have negative effects. Abuse of inlining can have some scary consequences in extreme cases, but usually performance is a little worse and binaries are larger when used improperly. Proper use of inlining can have considerably positive results.
Inlining is also helpful to remove symbols which would otherwise be exported, reducing the binary, depending on the number of instances and size.
Another thing: You'll get different linkage with C++.