Can I put pragma `Inline` in the body instead of the spec? - inline

Ada Information Clearinghouse states the following:
The use of pragma Inline does have its disadvantages. It can create compilation dependencies on the body; that is, when the specification uses a pragma Inline, both the specification and corresponding body may need to be compiled before the specification can be used.
Does putting pragma Inline in the body avoid this problem?

The advantage is that Inline in the specification allows for cross-unit inlining which can be a very powerful run-time optimization.
The disadvantage you mention matters rather when you compile on a computer which is slow or has few cores.
Then it's a run time vs compile time trade-off.
Note that on GNAT, cross-unit inlining is enabled by a single switch (-gnatn), so don't be afraid by the Inline pragma creating compilation dependencies: you can switch the whole mechanism on or off with that switch.

Related

Does function inlining use more RAM or ROM? and what is effect on micro-controller's RAM?

When I write an inline function in my code (a bare-metal code for stm32), I know that every time we call this inline function it is replaced with its definition this saves the overhead of a function calling i.e. save stack.
Now I am confused, between RAM and ROM usage of the inline function.
Can anyone please join memory usage of an inline function with RAM and ROM usage, especially in the context of a bare-metal code.
Your linker will be able to generate a link map with a summary of ROM and RAM usage - you can build with and without inlining and see the result for yourself.
Inlining causes copies of the code to replace calls to a single copy of code, so by definition the code space (ROM in your case) is increased. It has little effect on RAM, although reduce stack usage by a small amount because no return address is required. That is a run-time reduction and will not show in the link map.
It will only make a difference if your compiler chooses to apply the inline request. GCC for example will not do so at the -O0 (default) optimisation level, and even then may not do so in all circumstances, and may even in-line code that is not explicitly marked for inlining.
Your compiler may have a means to force inlining, but the inline keyword is not it - for example in GCC you would use the __attribute__((always_inline)) function attribute. However second guessing the compiler when it comes to what should and should not be inlined is generally a bad idea with a modern optimising compiler. It will generally make better more holistic decisions on a code base of any significant size with no developer effort.
If you run your code from the FLASH memory, the length of the code does not affect your RAM usage. The code will be longer of course if the function is inlined more than once (but it is not guaranteed by the 'inline' keyword).
Another question is the usage of the SRAM memory for the local variables storage. Usually it will be lower as inlining allows more aggressive optimizations (inline functions can be used only if the optimizations are on - otherwise they will not be inlined).
'inline' keyword is only a suggestion for the compiler, if you want to force the inlining you need to use the apripriate attribute or pragma - for gcc __attribute__((always_inline))

C inline all functions in source file

In my project I have some "Interface" functions which are defined in my header files. But I have much more functions which are only used internally in the same source file. So there is also no function declaration in any header file.
Is it good practice to declare them as inline? Since inline functions are better in performance it sounds like a good idea to me. Or are there any drawbacks? I know that the executable file might get larger in size but this is ok for me.
Local functions, not shared in different units of translations, should be qualified as static.
The inline specifier instructs compiler to inline the function if it can, but doesn't impose the inlining. If you require that the compiler inlines the function use forceinline, or __forceinline, to force the inlining.
Anyway when inlining functions consider that this is not always the best solution. You should carefully check the overall performance of your code. The inlining process merge the code of the function inside the calling code. This requires some register availability to perform calculation without disturbing the main execution. Sometime the process of register scooting requires temporary storage of current values or the like that could deteriorate the overall efficiency of code.
The inlining is beneficial when it is a short process and the time required for its execution inside main code is less than the time required for call prologue and epilogue.
Don't hesitate, just tag them as static inline as you want.
Note that a function marked inline is not necessary inline. It's left to the compilers to judge whether to make it inline. Modern compilers are "clever" enough, so simply let your compiler make its decision, which is trustworthy.
inline is just a hint to the compiler. If you specify inline the compiler may choose to not honour that hint. If you do not specify inline the compiler may choose to inline the function.
See 6.7.4 (p6) in the C11 Standard.

__forceinline__ effect at CUDA C __device__ functions

There is a lot of advice on when to use inline functions and when to avoid it in regular C coding. What is the effect of __forceinline__ on CUDA C __device__ functions? Where should they be used and where be avoided?
Normally the nvcc device code compiler will make it's own decisions about when to inline a particular __device__ function and generally speaking, you probably don't need to worry about overriding that with the __forceinline__ decorator/directive.
cc 1.x devices don't have all the same hardware capabilities as newer devices, so very often the compiler will automatically inline functions for those devices.
I think the reason to specify __forceinline__ is the same as what you may have learned about host C code. It is usually used for optimization when the compiler might not otherwise inline the function (e.g. on cc 2.x or newer devices). This optimization (i.e. function call overhead) might be negligible if you were calling the function once, but if you were calling the function in a loop for example, making sure it was inlined might give noticeable improvement in code execution.
As a counter example, inlining and recursion generally have contra-indications. For a recursive function that calls itself, I don't think it's possible to handle arbitrary recursion and also strict inlining. So if you intend to use a function recursively (supported in cc 2.x and above) you probably wouldn't want to specify __forceinline__.
In general, I think you should let the compiler manage this for you. It will intelligently decide whether to inline a function.

Single Source Code vs Multiple Files + Libraries

How much effect does having multiple files or compiled libraries vs. throwing everything (>10,000 LOC) into one source have on the final binary? For example, instead of linking a Boost library separately, I paste its code, along with my original source, into one giant file for compilation. And along the same line, instead of feeding several files into gcc, pasting them all together, and giving only that one file.
I'm interested in the optimization differences, instead of problems (horror) that would come with maintaining a single source file of gargantuan proportions.
Granted, there can only be link-time optimization (I may be wrong), but is there a lot of difference between optimization possibilities?
If the compiler can see all source code, it can optimize better if your compiler has some kind of Interprocedural Optimization (IPO) option turned on. IPO differs from other compiler optimization because it analyzes the entire program; other optimizations look at only a single function, or even a single block of code
Here is some interprocedural optimization that can be done, see here for more:
Inlining
Constant propagation
mod/ref analysis
Alias analysis
Forward substitution
Routine key-attribute propagation
Partial dead call elimination
Symbol table data promotion
Dead function elimination
Whole program analysis
GCC supports this kind of optimization.
This interprocedural optimization can be used to analyze and optimize the function being called.
If compiler can not see the source code of the library function, it cannot do such optimization.
Note that some modern compilers (clang/LLVM, icc and recently even gcc) now support link-time-optimization (LTO) to minimize the effect of separate compilation. Thus you gain the benefits of separate compilation (maintenance, faster compilation, etc.) and these of whole program analysis.
By the way, it seems like gcc has supported -fwhole-program and --combine since version 4.1. You have to pass all source files together, though.
Finally, since BOOST is mostly header files (templates) that are #included, you cannot gain anything from adding these to your source code.

What does #pragma intrinsic mean?

Just want to know what does #pragma intrinsic(_m_prefetchw) mean ?
As far as I am aware, that looks like someone was intending to modify some MSVC++ specific setting. However, that setting is not a valid option for the intrinsic pragma. _m_prefetchw on the other hand is a 3D Now! intrinsic function.
Like all compiler intrinsic functions, it exposes (possibly) faster assembly instructions supported by the underlying hardware to your C or C++ application in a manner
A. more consistent with optimizers, and
B. more consistent with the language, when compared with using inline assembly.
On MSVC on x86_64/x64/amd64 systems, inline assembly is not supported, so one must use such intrinsics to access whizzbang features of the underlying hardware.
Finally, it should be noted that _m_prefetchw is a 3D Now! intrinsic, and 3D Now! is only supported on AMD hardware. It's probably not something you want to use for new code (i.e. you should use SSE instead, which works on both Intel and AMD hardware, and has more features to boot).
The meaning of "#pragma intrinsic" (note spelling), as with all "#pragma" directives, varies from one compiler to another. Generally, it indicates that a particular thing that looks syntactically like a call to an external function should be replaced with some inline code. In some cases, this may greatly improve performance, especially if the compiler can determine constant values for some or all of the arguments (in the latter situation, the compiler may be able to compute the value of the function and replace it with a constant).
Generally, having functions processed as intrinsic won't pose any particular problem. The biggest danger is that if a user defines in one module a function with the same name as one of the compiler's intrinsic function, and attempts to call that function from another module, the compiler might instead replace the function call with its expected instruction sequence. To prevent this, some compilers don't enable intrinsic functions by default (since doing so would cause the above incompatibility with some standard-conforming programs) but provide #pragma directives to do enable them. Compilers may also use command-line option to enable intrinsics (since the standard allows anything there), or may define some functions like __memcpy() as intrinsic, and within string.h, use a #define directive to convert memcpy into __memcpy (since programs that #include string.h are not allowed to use memcpy for any other purpose).
In C, it depends on whether the implementation recognizes (and defines) it.
If the implementation does not recognize the "intrinsic" preprocessing token, the pragma is ignored.
If the implementation recognizes it, whatever is defined will happen (and if another implementation defines it differently, a different thing happens on the other implementation).
So, check the documentation for the implementation you're talking about (edit: and don't use it if you expect to compile your source on different implementations).
I couldn't find any reference to "#pragma intrinsic" in man gcc, on my system.
The intrinsic pragma tells the compiler that a function has known behavior. The compiler may call the function and not replace the function call with inline instructions, if it will result in better performance.
Source: http://msdn.microsoft.com/en-us/library/tzkfha43(VS.80).aspx

Resources