What does #pragma intrinsic mean? - c

Just want to know what does #pragma intrinsic(_m_prefetchw) mean ?

As far as I am aware, that looks like someone was intending to modify some MSVC++ specific setting. However, that setting is not a valid option for the intrinsic pragma. _m_prefetchw on the other hand is a 3D Now! intrinsic function.
Like all compiler intrinsic functions, it exposes (possibly) faster assembly instructions supported by the underlying hardware to your C or C++ application in a manner
A. more consistent with optimizers, and
B. more consistent with the language, when compared with using inline assembly.
On MSVC on x86_64/x64/amd64 systems, inline assembly is not supported, so one must use such intrinsics to access whizzbang features of the underlying hardware.
Finally, it should be noted that _m_prefetchw is a 3D Now! intrinsic, and 3D Now! is only supported on AMD hardware. It's probably not something you want to use for new code (i.e. you should use SSE instead, which works on both Intel and AMD hardware, and has more features to boot).

The meaning of "#pragma intrinsic" (note spelling), as with all "#pragma" directives, varies from one compiler to another. Generally, it indicates that a particular thing that looks syntactically like a call to an external function should be replaced with some inline code. In some cases, this may greatly improve performance, especially if the compiler can determine constant values for some or all of the arguments (in the latter situation, the compiler may be able to compute the value of the function and replace it with a constant).
Generally, having functions processed as intrinsic won't pose any particular problem. The biggest danger is that if a user defines in one module a function with the same name as one of the compiler's intrinsic function, and attempts to call that function from another module, the compiler might instead replace the function call with its expected instruction sequence. To prevent this, some compilers don't enable intrinsic functions by default (since doing so would cause the above incompatibility with some standard-conforming programs) but provide #pragma directives to do enable them. Compilers may also use command-line option to enable intrinsics (since the standard allows anything there), or may define some functions like __memcpy() as intrinsic, and within string.h, use a #define directive to convert memcpy into __memcpy (since programs that #include string.h are not allowed to use memcpy for any other purpose).

In C, it depends on whether the implementation recognizes (and defines) it.
If the implementation does not recognize the "intrinsic" preprocessing token, the pragma is ignored.
If the implementation recognizes it, whatever is defined will happen (and if another implementation defines it differently, a different thing happens on the other implementation).
So, check the documentation for the implementation you're talking about (edit: and don't use it if you expect to compile your source on different implementations).
I couldn't find any reference to "#pragma intrinsic" in man gcc, on my system.

The intrinsic pragma tells the compiler that a function has known behavior. The compiler may call the function and not replace the function call with inline instructions, if it will result in better performance.
Source: http://msdn.microsoft.com/en-us/library/tzkfha43(VS.80).aspx

Related

What remains in C if I exclude libraries and compiler extensions?

Imagine a situation where you can't or don't want to use any of the libraries provided by the compiler as "standard", nor any external library. You can't use even the compiler extensions (such as gcc extensions).
What is the remaining part you get if you strip C language of all the things a lot of people use as a matter of course?
In such a way, probably a list of every callable function supported by any big C compiler (not only ANSI C) out-of-box would be satisfying as as answer as it'd at least approximately show the use-case of the language.
First I thought about sizeof() and printf() (those were already clarified in the comments - operator + stdio), so... what remains? In-line assembly seem like an extension too, so that pretty much strips even the option to use assembly with C if I'm right.
Probably in the matter of code it'd be easier to understand. Imagine a code compiled with only e.g. gcc main.c (output flag permitted) that has no #include, nor extern.
int main() {
// replace_me
return 0;
}
What can I call to actually do something else than "boring" type math and casting from type to type?
Note that switch, goto, if, loops and other constructs that do nothing and only allow repeating a piece of code aren't the thing I'm looking for (if it isn't obvious).
(Hopefully the edit clarified wtf I'm actually asking, but Matteo's answer pretty much did it.)
If you remove all libraries essentially you have something similar to a freestanding implementation of C (which still has to provide some libraries - say, string.h, but that's nothing you couldn't easily implement yourself in portable C), and that's what normally you start with when programming microcontrollers and other computers that don't have a ready-made operating system - and what operating system writers in general use when they compile their operating systems.
There you typically have two ways of doing stuff besides "raw" computation:
assembly blocks (where you can do literally anything the underlying machine can do);
memory mapped IO (you set a volatile pointer to some hardware dependent location and read/write from it; that affects hardware stuff).
That's really all you need to build anything - and after all, it all boils down to that stuff anyway, the C library of a regular hosted implementation is normally written in C itself, with some assembly used either for speed or to communicate with the operating system1 (typically the syscalls are invoked through some kind of interrupt).
Again, it's nothing you couldn't implement yourself. But the point of having a standard library is both to avoid to continuously reinvent the wheel, and to have a set of portable functions that spare you to have to rewrite everything knowing the details of each target platform.
And mainstream operating systems, in turn, are generally written in a mix or C and assembly as well.
C has no "built-in" functions as such. A compiler implementation may include "intrinsic" functions that are implemented directly by the compiler without provision of an external library, although a prototype declaration is still required for intrinsics, so you would still normally include a header file for such declarations.
C is a systems-level language with a minimal run-time and start-up requirement. Because it can directly access memory and memory mapped I/O there is very little that it cannot do (and what it cannot do is what you use assembly, in-line assembly or intrinsics for). For example, much of the library code you are wondering what you can do without is written in C. When running in an OS environment however (using C as an application-level rather then system-level language), you cannot practically use C in that manner - the OS has control over such things as I/O and memory-management and in modern systems will normally prevent unmediated access to such resources. Of course that OS itself is likely to largely written in C (and/or C++).
In a standalone of bare-metal environment with no OS, C is often used very early in the bootstrap process initialising hardware and establishing an application execution environment. In fact on ARM Cortex-M processors it is possible to boot directly into C code from reset, since the hardware loads an initial stack-pointer and start address from the vector table on start-up; this being enough to run C code that does not rely on library or static data initialisation - such initialisation can however be written in C before calling main().
Note that sizeof is not a function, it is an operator.
I don't think you really understand the situation.
You don't need a header to call a function in C. You can call with unchecked parameters - a bad idea and an obsolete feature, but still supported. And if a compiler links a library by default instead of only when you explicitly tell it to, that's only a little switch within the compiler to "link libc". Notoriously Unix compilers need to be told to link the math library, it wasn't linked by default because some very early programs didn't use floating point.
To be fair, some standard library functions like memcpy tend to be special-cased these days as they lend themselves to inlining and optimisation.
The standard library is documented and is usually available, though in effect deprecated by Microsoft for security reasons. You can write pretty much any function quite easily with only stdlib functions, what you can't do is fancy IO.

gcc intrinsic vs inline assembly : which is better?

If I want to expose a single machine specific instruction to the programmer, there are two ways I can do so :
Define a new builtin / intrinsic
Expose the same as inline assembly asm() [As its a single arithmetic type instruction, I believe there is no need for asm volatile()]
I have read that builtins allow the compiler to take care of the type checking, register allocation and "other optimizations" etc. But the compiler will need to do this even in case of asm (), right ? So what precisely is the performance benefit of using intrinsic over asm () for a single instruction ?
How does the equation change if there are multiple machine instructions involved ?
The "portability" argument in favor of intrinsic is understandable, but I am curious to understand the performance advantage, if any, of one over the other.
I think it depends a lot on what you're doing. Modifying GCC, and requiring a modified GCC to build your program unless/until your GCC patch makes it upstream, is a lot more of a headache than just using inline asm.
If the instruction you want to use has an abstract meaning not tied down to a particular instruction set architecture, adding the builtin/intrinsic so that the same code using it could automatically work on all targets (with fallback to a more complex implementation with multiple instructions on targets that don't have the instruction) is probably the "right" choice, but might not be practical still.
If the instruction is something very ISA-specific, obscure, not-performance-critical, etc. (I'm thinking of loading a special hardware register, cpu mode register, getting model info, etc. but I'm sure you can think of other examples) then just using inline asm is almost certainly the right solution.
Even if you do think a builtin is the "right" solution for your problem, but need to take the inline asm approach for practical reasons, you can still abstract it with a macro or static inline function in such a way that it's easy to replace all uses with an intrinsic later (or with a fallback implementation on targets without the instruction).

__forceinline__ effect at CUDA C __device__ functions

There is a lot of advice on when to use inline functions and when to avoid it in regular C coding. What is the effect of __forceinline__ on CUDA C __device__ functions? Where should they be used and where be avoided?
Normally the nvcc device code compiler will make it's own decisions about when to inline a particular __device__ function and generally speaking, you probably don't need to worry about overriding that with the __forceinline__ decorator/directive.
cc 1.x devices don't have all the same hardware capabilities as newer devices, so very often the compiler will automatically inline functions for those devices.
I think the reason to specify __forceinline__ is the same as what you may have learned about host C code. It is usually used for optimization when the compiler might not otherwise inline the function (e.g. on cc 2.x or newer devices). This optimization (i.e. function call overhead) might be negligible if you were calling the function once, but if you were calling the function in a loop for example, making sure it was inlined might give noticeable improvement in code execution.
As a counter example, inlining and recursion generally have contra-indications. For a recursive function that calls itself, I don't think it's possible to handle arbitrary recursion and also strict inlining. So if you intend to use a function recursively (supported in cc 2.x and above) you probably wouldn't want to specify __forceinline__.
In general, I think you should let the compiler manage this for you. It will intelligently decide whether to inline a function.

How to use the __attribute__ keyword in GCC C?

I am not clear with use of __attribute__ keyword in C.I had read the relevant docs of gcc but still I am not able to understand this.Can some one help to understand.
__attribute__ is not part of C, but is an extension in GCC that is used to convey special information to the compiler. The syntax of __attribute__ was chosen to be something that the C preprocessor would accept and not alter (by default, anyway), so it looks a lot like a function call. It is not a function call, though.
Like much of the information that a compiler can learn about C code (by reading it), the compiler can make use of the information it learns through __attribute__ data in many different ways -- even using the same piece of data in multiple ways, sometimes.
The pure attribute tells the compiler that a function is actually a mathematical function -- using only its arguments and the rules of the language to arrive at its answer with no other side effects. Knowing this the compiler may be able to optimize better when calling a pure function, but it may also be used when compiling the pure function to warn you if the function does do something that makes it impure.
If you can keep in mind that (even though a few other compilers support them) attributes are a GCC extension and not part of C and their syntax does not fit into C in an elegant way (only enough to fool the preprocessor) then you should be able to understand them better.
You should try playing around with them. Take the ones that are more easily understood for functions and try them out. Do the same thing with data (it may help to look at the assembly output of GCC for this, but sizeof and checking the alignment will often help).
Think of it as a way to inject syntax into the source code, which is not standard C, but rather meant for consumption of the GCC compiler only. But, of course, you inject this syntax not for the fun of it, but rather to give the compiler additional information about the elements to which it is attached.
You may want to instruct the compiler to align a certain variable in memory at a certain alignment. Or you may want to declare a function deprecated so that the compiler will automatically generate a deprecated warning when others try to use it in their programs (useful in libraries). Or you may want to declare a symbol as a weak symbol, so that it will be linked in only as a last resort, if any other definitions are not found (useful in providing default definitions).
All of this (and more) can be achieved by attaching the right attributes to elements in your program. You can attach them to variables and functions.
Take a look at this whole bunch of other GCC extensions to C. The attribute mechanism is a part of these extensions.
There are too many attributes for there to be a single answer, but examples help.
For example __attribute__((aligned(16))) makes the compiler align that struct/function on a 16-bit stack boundary.
__attribute__((noreturn)) tells the compiler this function never reaches the end (e.g. standard functions like exit(int) )
__attribute__((always_inline)) makes the compiler inline that function even if it wouldn't normally choose to (using the inline keyword suggests to the compiler that you'd like it inlining, but it's free to ignore you - this attribute forces it).
Essentially they're mostly about telling the compiler you know better than it does, or for overriding default compiler behaviour on a function by function basis.
One of the best (but little known) features of GNU C is the attribute mechanism, which allows a developer to attach characteristics to function declarations to allow the compiler to perform more error checking. It was designed in a way to be compatible with non-GNU implementations, and we've been using this for years in highly portable code with very good results.
Note that attribute spelled with two underscores before and two after, and there are always two sets of parentheses surrounding the contents. There is a good reason for this - see below. Gnu CC needs to use the -Wall compiler directive to enable this (yes, there is a finer degree of warnings control available, but we are very big fans of max warnings anyway).
For more information please go to http://unixwiz.net/techtips/gnu-c-attributes.html
Lokesh Venkateshiah

inline a function inside another inline function in C

I currently have inline functions calling another inline function (a simple 4 lines big getAbs() function). However, I discovered by looking to the assembler code that the "big" inline functions are well inlined, but the compiler use a bl jump to call the getAbs() function.
Is it not possible to inline a function in another inline function? By the way, this is embedded code, we are not using the standard libraries.
Edit : The compiler is WindRiver, and I already checked that inlining would be beneficial (4 instructions instead of +-40).
Depending on what compiler you are using you may be able to encourage the compiler to be less reluctant to inline, e.g. with gcc you can use __attribute__ ((always_inline)), with Intel ICC you can use icc -inline-level=1 -inline-forceinline, and with Apple's gcc you can use gcc -obey-inline.
The inline keyword is a suggestion to the compiler, nothing more. It's free to take that suggestion on board, totally ignore it or even lie to you and tell that it's doing it while it's really not.
The only way to force code to be inline is to, well, write it inline. But, even, then the compiler may decide it knows better and decide to shift it out to another function. It has a lot of leeway in generating executable code for your particular source, provided it doesn't change the semantics of it.
Modern compilers are more than capable of generating better code than most developers would hand-craft in assembly. I think the inline keyword should go the same path as the register keyword.
If you've seen the output of gcc at its insane optimisation level, you'll understand why. It has produced code that I wouldn't have dreamed possible, and that took me a long time to understand.
As an aside, check this out for what optimisations that gcc actually has, including a great many containing the text "inline" or "inlining".
#gramm: There's quite a few scenarios in which inline isn't necessarily to your benefit. Most compilers use some very advanced heuristics to determine when to inline. When discussing inlining, the simplest idea is, trust your compiler to produce the fastest code.
I have recently had a very similar problem, reading this post has given me a wackky idea. Why not Have a simple pre-compilation (a simple reg ex should do the job ) code parser that parses out the function call to actually put the source code in-line. use a tag such as /inline/ /end_of_inline/ so that you can use normal ide features (if you are or might use an ide.
Include this in your build process, that way you have the readability advantage as well as removing the compilers assumption that you are only as good a developer as most and do not understand when to in-line.
Nonetheless before trying this you should probably go through the compilers command line options.
I would suggest that if your getAbs() function (sounds like absolute value but you really should be showing us code with the question...) is 4 lines long, then you have much bigger optimizations to worry about than whether the code gets inlined or not.

Resources